Close Menu

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    Judge Frank Caprio, beloved for viral court videos, dies at 88 from pancreatic cancer

    August 21, 2025

    Exploring Kentucky: KEDC Unveils New AI Lab to Inspire the Next Generation of Innovators

    August 21, 2025

    Texas House passes new GOP-friendly congressional maps

    August 21, 2025
    Facebook X (Twitter) Instagram
    • Demos
    • Buy Now
    Facebook X (Twitter) Instagram YouTube
    14 Trends14 Trends
    Demo
    • Home
    • Features
      • View All On Demos
    • Buy Now
    14 Trends14 Trends
    Home » MindJourney enables AI to explore simulated 3D worlds to improve spatial interpretation
    AI Features

    MindJourney enables AI to explore simulated 3D worlds to improve spatial interpretation

    adminBy adminAugust 20, 2025No Comments4 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    Three white line icons on a gradient background transitioning from blue to pink. From left to right: a network or molecule structure with a central circle and six surrounding nodes, a 3D cube, and an open laptop with an eye symbol above it.

    A new research framework helps AI agents explore three-dimensional spaces they can’t directly detect. Called MindJourney, the approach addresses a key limitation in vision-language models (VLMs), which give AI agents their ability to interpret and describe visual scenes.  

    While VLMs are strong at identifying objects in static images, they struggle to interpret the interactive 3D world behind 2D images. This gap shows up in spatial questions like “If I sit on the couch that is on my right and face the chairs, will the kitchen be to my right or left?”—tasks that require an agent to interpret its position and movement through space. 

    People overcome this challenge by mentally exploring a space, imagining moving through it and combining those mental snapshots to work out where objects are. MindJourney applies the same process to AI agents, letting them roam a virtual space before answering spatial questions. 

    How MindJourney navigates 3D space

    To perform this type of spatial navigation, MindJourney uses a world model—in this case, a video generation system trained on a large collection of videos captured from a single moving viewpoint, showing actions such as going forward and turning left of right, much like a 3D cinematographer. From this, it learns to predict how a new scene would appear from different perspectives.

    At inference time, the model can generate photo-realistic images of a scene based on possible movements from the agent’s current position. It generates multiple possible views of a scene while the VLM acts as a filter, selecting the constructed perspectives that are most likely to answer the user’s question.

    These are kept and expanded in the next iteration, while less promising paths are discarded. This process, shown in Figure 1, avoids the need to generate and evaluate thousands of possible movement sequences by focusing only on the most informative perspectives.

    Figure 1. Given a spatial reasoning query, MindJourney searches through the imagined 3D space using a world model and improves the VLM's spatial interpretation through generated observations when encountering a new  challenges.
    Figure 1. Given a spatial reasoning query, MindJourney searches through the imagined 3D space using a world model and improves the VLM’s spatial interpretation through generated observations when encountering new challenges. 

     

    To make its search through a simulated space both effective and efficient, MindJourney uses a spatial beam search—an algorithm that prioritizes the most promising paths. It works within a fixed number of steps, each representing a movement. By balancing breadth with depth, spatial beam search enables MindJourney to gather strong supporting evidence. This process is illustrated in Figure 2.

    MindJourney pipeline diagram
    Figure 2. The MindJourney workflow starts with a spatial beam search for a set number of steps before answering the query. The world model interactively generates new observations, while a VLM interprets the generated images, guiding the search throughout the process.

    By iterating through simulation, evaluation, and integration, MindJourney can reason about spatial relationships far beyond what any single 2D image can convey, all without the need for additional training. On the Spatial Aptitude Training (SAT) benchmark, it improved the accuracy of VLMs by 8% over their baseline performance.

    PODCAST SERIES

    AI Testing and Evaluation: Learnings from Science and Industry

    Discover how Microsoft is learning from other domains to advance evaluation and testing as a pillar of AI governance.


    Opens in a new tab

    Building smarter agents  

    MindJourney showed strong performance on multiple 3D spatial-reasoning benchmarks, and even advanced VLMs improved when paired with its imagination loop. This suggests that the spatial patterns that world models learn from raw images, combined with the symbolic capabilities of VLMs, create a more complete spatial capability for agents. Together, they enable agents to infer what lies beyond the visible frame and interpret the physical world more accurately. 

    It also demonstrates that pretrained VLMs and trainable world models can work together in 3D without retraining either one—pointing toward general-purpose agents capable of interpreting and acting in real-world environments. This opens the way to possible applications in autonomous robotics, smart home technologies, and accessibility tools for people with visual impairments. 

    By converting systems that simply describe static images into active agents that continually evaluate where to look next, MindJourney connects computer vision with planning. Because exploration occurs entirely within the model’s latent space—its internal representation of the scene—robots would be able to test multiple viewpoints before determining their next move, potentially reducing wear, energy use, and collision risk. 

    Looking ahead, we plan to extend the framework to use world models that not only predict new viewpoints but also forecast how the scene might change over time. We envision MindJourney working alongside VLMs that interpret those predictions and use them to plan what to do next. This enhancement could enable agents more accurately interpret spatial relationships and physical dynamics, helping them to operate effectively in changing environments.

    Opens in a new tab





    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    admin
    • Website

    Related Posts

    Dion: the distributed orthonormal update revolution is here

    August 12, 2025

    Reimagining healthcare delivery and public health with AI

    August 7, 2025

    Self-adaptive reasoning for science – Microsoft Research

    August 6, 2025

    Project Ire autonomously identifies malware at scale

    August 5, 2025

    VeriTrail: Detecting hallucination and tracing provenance in multi-step AI workflows

    August 5, 2025

    Navigating medical education in the era of generative AI

    July 24, 2025
    Leave A Reply Cancel Reply

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Laws, norms, and ethics for AI in health

    May 1, 20252 Views
    Don't Miss

    Judge Frank Caprio, beloved for viral court videos, dies at 88 from pancreatic cancer

    August 21, 2025

    Judge Frank Caprio, who is known for his heartwarming viral court moments, has died. He…

    Exploring Kentucky: KEDC Unveils New AI Lab to Inspire the Next Generation of Innovators

    August 21, 2025

    Texas House passes new GOP-friendly congressional maps

    August 21, 2025

    What to know about powerful Hurricane Erin as it heads past the US East Coast

    August 21, 2025
    Stay In Touch
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo

    Subscribe to Updates

    Get the latest creative news from SmartMag about art & design.

    Demo
    Top Posts

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews
    Demo
    About Us
    About Us

    Your source for the lifestyle news. This demo is crafted specifically to exhibit the use of the theme as a lifestyle site. Visit our main page for more demos.

    We're accepting new partnerships right now.

    Email Us: info@example.com
    Contact: +1-320-0123-451

    Facebook X (Twitter) Pinterest YouTube WhatsApp
    Our Picks

    Judge Frank Caprio, beloved for viral court videos, dies at 88 from pancreatic cancer

    August 21, 2025

    Exploring Kentucky: KEDC Unveils New AI Lab to Inspire the Next Generation of Innovators

    August 21, 2025

    Texas House passes new GOP-friendly congressional maps

    August 21, 2025
    Most Popular

    ChatGPT’s viral Studio Ghibli-style images highlight AI copyright concerns

    March 28, 20254 Views

    Best Cyber Forensics Software in 2025: Top Tools for Windows Forensics and Beyond

    February 28, 20253 Views

    An ex-politician faces at least 20 years in prison in killing of Las Vegas reporter

    October 16, 20243 Views

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    14 Trends
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Home
    • Buy Now
    © 2025 ThemeSphere. Designed by ThemeSphere.

    Type above and press Enter to search. Press Esc to cancel.