Robotics and the Challenge of Embodied Intelligence

Authors

Chiara Bartolozzi
Head of Event-Driven Perception for Robotics Group
Italian Institute for Technology

The idea that we will soon have robots capable of performing many of the same tasks as humans has become commonplace, thanks to their depiction in films, television and books. However, the design of robots that can sense and process information about their environment, then use this information to make decisions about their own behaviour, remains a major and multi-faceted research and engineering challenge.

Despite recent breakthroughs in artificial intelligence (AI), robots’ cognitive systems still lack many of the abilities that humans possess when it comes to tasks such as robust perception, adaptive fine motor control, and adaptation to external conditions. For this reason, robots tend to operate most reliably within precisely calibrated and controlled environments or for a predefined set of operations. To perform in the world at large, they will need to reliably sense and plan their actions depending on, and continuously adapting to, the external environment and their internal state. If this can be achieved, they will be able to interact seamlessly with the environment and other agents to accomplish a range of different tasks, much as humans do. This “embodied” intelligence will represent a genuine breakthrough in creating machines that are able to reason, based on their own perceptions, in order to achieve a desired outcome.

Embodied intelligence is based on two distinct principles. The first principle is that the way a body is built is instrumental to its perception of the external world and to its behaviour and actions. That is, the way a robot is constructed will govern what it can perceive and what it will do: having two eyes, for example, gives depth perception through stereo vision; having legs connected to the body through hips enables the robot to exploit pendulum dynamics, allowing it to walk more energy-efficiently.

The second principle of embodied intelligence is the linking of morphology, perception and action. Current disembodied image recognition systems, based on deep learning, are trained on many online images, typically taken passively by static cameras. These are thus completely uncorrelated with whatever action a robot has done, or needs to take. The problem of perception becomes more difficult still because there is no information about how the object sounds, feels, or changes in appearance when moved. A robot looking at the world, however, can move its sensors and interact with the object to improve what it sees, in the interest of acquiring more information. It might, for example, move it to gain a clearer view, or rotate it to get multiple views; it can also use additional senses, such as touch, to gain further information about the object. It can simultaneously relate its actions to their sensory consequences in order to improve its execution of a specific action or particular function.

Furthermore, the interplay between the morphology of the robot and its behaviour can improve the efficiency of the overall system: for example, foveated vision allows for both high-resolution central vision and a wide field of view, while optimising resources (size, connectivity and data load), provided it is coupled with the intelligence needed to move the eyes towards salient stimuli.

Those are the principles; what about the practice? The datasets currently used for training state of the art AI models are mostly disembodied, and therefore do not allow us to study the role of learning in the interplay between morphology, actions, and perception. Embodiment enhances the possibility of developing intelligent and efficient systems, using less data to learn and supporting continual learning in new environments. So the first challenge is to understand how to encode intelligence robustly, enabling data from multiple sources to be processed on multiple timescales to ensure life-long learning, adaptation and memory organisation.

Current AI methods also have immense computational resource requirements, requiring large datasets and lengthy training times, and often also expensive hardware and vast amounts of energy. That means they are not well suited to embodied AI, which will have to deal with multiple different scenarios expediently, without learning from a vast new dataset for each situation. So the second challenge is to develop embodied AI models that can efficiently adapt and learn new tasks in ever changing scenarios, using less data, time, computational resources and energy.

Recent meta-learning techniques can improve on the problem and lead to systems that can adapt to novel situations: using “one-shot” or “few-shot” learning, a system can assimilate knowledge and learn a new task from just a few examples. However, this will barely approach the efficiency of biological neural networks shaped by millions of years of evolution. The biological networks are vastly more efficient than conventional computing at perception, hierarchical information extraction, adaptation, continual learning, and memorisation of temporally structured data.

If we can understand how neurobiology evolved, develops, and works, we can use the same principles to give us a head start in developing neuromorphic computational systems for robotics that will show robust, reliable and adaptive performance in complex, ever-changing and uncontrolled environments, as well as higher efficiency, lower latency and reduced computational cost. There are multiple “technological” and computational factors that underlie the brain’s capabilities, including event-driven encoding, high parallelism, and the capability of neurons to compute and store information in the same place.

Consider the first of these: event-driven encoding. Today’s conventional sensors sample and process information at single moments in time, determined by the rhythm of an internal clock, rather than being driven by external events. By contrast, biological systems sense and process changes in sensory data, sampling data as and when these changes occur. For example, if we flick a switch so a light goes off, the retina of the eye is stimulated and information about the change is transferred to the brain, rather than an ongoing succession of reports about the on or off status of the light. This event-driven sensory information is not only very rich, but requires much less computational power than a clock-driven system, and can thus allow the computational sensory system to become much more efficient than conventional AI.

While GPUs support a high degree of parallelism, they are still based on conventional computing infrastructure, whereby memory and processing are physically separated: performing an operation requires the machine to extract the relevant information from its memory, transfer it to the processor, perform the operation, and then transfer the outcome back into memory. This comprises most of the cost of current computation. By bringing these two functions together (while also supporting highly parallel computation) as they are in the brain, neuromorphic computing has the potential to vastly increase the efficiency of the computational process itself.

Neuromorphic technology is moving embodied artificial intelligence on from a static, frame-based computing system to a dynamic, event-based system able to efficiently process information and adapt to new situations. While event-driven vision sensors have reached the consumer’s market, neuromorphic computing platforms are still evolving and mostly used within the research domain. In the next few years, we will see silicon-based platforms in the market. At the same time, evolved systems, with additional computational primitives and memristive devices for co-localisation of memory and computing, will be available to the research community at large. Novel neuromorphic computing devices are in the making, based on flexible organic electronics, organoids, photonics, etc.

Progress in neural computation, as well as neuromorphic hardware, will be needed to build neuromorphic systems. Describing a brain’s computational process mathematically, to produce a function that can be plugged into a processing system, is challenging, especially since we don’t yet fully understand all of biological computing’s components and their computational role in a unified conceptual framework. In a biological system, neurons can not only interact with one another, but their computational function is affected by the manner of their interaction. There are different forms of plasticity and computation performed at different temporal and spatial scales.

As an example, while most neuromorphic computing uses point neurons, biological neurons are way more complex: dendrites, the branched, tree-like structures at the end of neurons, gather signals though contacts with other cells in their proximity and from other brain areas. The signal received depends not only on the type of chemical contact between the cells, but also on exactly where the two branches of the cells interact and the time it takes for these signals to travel through them. More neuroscience research is needed to understand the computational relevance of each component of the biological system; this understanding can then be integrated into neuromorphic computing and in the development of new neuromorphic hardware platforms.

That will make them much more useful in relatively unconstrained real-world situations, as already demonstrated by proof-of-concept systems that can solve specific perception and control tasks, and carry out simple decision-making such as visual and auditory classification, sound source localisation, visual and auditory attention, basic navigation, obstacle avoidance, landmark recognition, trajectory prediction, and so on. In few years, more robustness and adaptation capabilities will be achieved by the integration of new computational methods such as balanced EI networks, dynamic attractors, multisensory integration and lifelong learning. This will later need to be scaled up and integrated with cognitive large scale brain models to solve complex human-like tasks.

There is a long way to go, but neuromorphic computing to create robots with embodied intelligence is essential for machines to display true hallmarks of cognition, and for artificial intelligence to become meaningfully intelligent. Machines need to be aware of their environment, be self-aware of their own bodies, robustly interpret their own sensory data, and make decisions about the action that will best fulfil their task. Only then will we have robots that are truly fit for the real world.