World-modelling and embodied AI
Comment
Stakeholder Type

World-modelling and embodied AI

1.1.2

Sub-Field

World-modelling and embodied AI

Despite their impressive capabilities, leading AI models understand relatively little about the world around them. This is because the text and image data they are trained on contains only indirect knowledge of physical reality. This limits their ability to both understand and act in dynamic real-world environments.

Future Horizons:

×××

5-yearhorizon

Basic multimodal data is embodied in simulation environments

Researchers integrate basic multimodal data (vision, sound and language) with rudimentary world models in embodied simulation environments. Early advances in robotic reasoning and planning with limited grounding in reality.

10-yearhorizon

AI agents acquire more powerful world models

AI agents acquire deeper hierarchical world models, capable of richer zero-shot generalisation and more advanced planning in open-ended tasks. Robotics platforms provide lifelong learning through continual sensory experience.

25-yearhorizon

Embodied AI achieves human-like capabilities

Embodied AI achieves human-like capabilities in manipulation and exploration, including advanced self-awareness and real-time learning. Grounding in physical reality and multimodal perception becomes standard, facilitating robust causal reasoning. Autonomous AI systems operate across diverse domains, dynamically adapting and self-experimenting to discover new skills and strategies. World models move towards being as flexible and abstract as those seen in biological intelligence, driving new forms of interaction between intelligent systems and the environment.

Internal world models, which allow an agent to predict environmental states given actions, are central to intelligence, supporting zero-shot generalisation and planning.9 Embodied intelligence, which is intelligence that is rooted in the connection between perception, action and reality, can be likened to evolutionary and child-development processes: human children’s learning, viewed as “scientists in the crib”, provides a model for active exploration and self-directed experiment in intelligent systems. However, current text-based models lack the necessary grounding and are orders of magnitude less data-efficient than humans. Real-world, multimodal data is critical for building future models, and robotics offers a promising path: here, the integration of language, vision, memory and manipulation capabilities supports hierarchical planning and lifelong learning.10

In the long term, AI robots may develop complex self-awareness and seamlessly acquire new skills from rich sensory experience, moving closer to human-like capabilities. Safety boundaries must be set, though. For example, reinforcement- learning strategies that incentivise pure survival should be avoided to ensure that AI acts as a tool aligned with human interests. Some researchers feel that highly abstract reasoning and planning must always be reducible to linguistic or symbolic representations in order to maintain transparency, but there is debate about this. There is also debate about whether LLM-based approaches can generalise to richly structured, non-textual reality such as images and 3D worlds. Overall, embodied learning will be important for credible AI advancement.

World-modelling and embodied AI - Anticipation Scores

The Anticipation Potential of a research field is determined by the capacity for impactful action in the present, considering possible future transformative breakthroughs in a field over a 25-year outlook. A field with a high Anticipation Potential, therefore, combines the potential range of future transformative possibilities engendered by a research area with a wide field of opportunities for action in the present. We asked researchers in the field to anticipate:

  1. The uncertainty related to future science breakthroughs in the field
  2. The transformative effect anticipated breakthroughs may have on research and society
  3. The scope for action in the present in relation to anticipated breakthroughs.

This chart represents a summary of their responses to each of these elements, which when combined, provide the Anticipation Potential for the topic. See methodology for more information.