1.1.2. Multimodal AI
Download PDF
1.1.2. Multimodal AI
Use the future to build the present
Multimodal AI
Comment
Stakeholder Type
1.1Advanced AI1.2QuantumRevolution1.3UnconventionalComputing1.4AugmentedReality1.5CollectiveIntelligence2.1CognitiveEnhancement2.2HumanApplicationsof GeneticEngineering2.3HealthspanExtension2.4ConsciousnessAugmentation2.5Organoids2.6FutureTherapeutics3.1Decarbonisation3.2EarthSystemsModelling3.3FutureFoodSystems3.4SpaceResources3.5OceanStewardship3.6SolarRadiationModification3.7InfectiousDiseases4.1Science-basedDiplomacy4.2Advancesin ScienceDiplomacy4.3Foresight,Prediction,and FuturesLiteracy4.4Democracy-affirmingTechnologies5.1ComplexSystemsScience5.2Futureof Education5.3Future Economics,Trade andGlobalisation5.4The Scienceof theOrigins of Life5.5SyntheticBiology
1.1Advanced AI1.2QuantumRevolution1.3UnconventionalComputing1.4AugmentedReality1.5CollectiveIntelligence2.1CognitiveEnhancement2.2HumanApplicationsof GeneticEngineering2.3HealthspanExtension2.4ConsciousnessAugmentation2.5Organoids2.6FutureTherapeutics3.1Decarbonisation3.2EarthSystemsModelling3.3FutureFoodSystems3.4SpaceResources3.5OceanStewardship3.6SolarRadiationModification3.7InfectiousDiseases4.1Science-basedDiplomacy4.2Advancesin ScienceDiplomacy4.3Foresight,Prediction,and FuturesLiteracy4.4Democracy-affirmingTechnologies5.1ComplexSystemsScience5.2Futureof Education5.3Future Economics,Trade andGlobalisation5.4The Scienceof theOrigins of Life5.5SyntheticBiology

Sub-Field:

1.1.2Multimodal AI

    AI research has traditionally focussed on solving discrete problems that involve a single type of data, such as images, text, or audio. This has led to superhuman capabilities in narrow areas like object recognition, speech recognition and game playing. But humans and animals use multiple senses to navigate the world around them, and there is growing recognition that for AI to become more flexible it will need to work with multiple data modalities at once.

    Multimodal AI is already used in autonomous vehicles to fuse input from cameras, radar and lidar and shows promise in healthcare, where a wide range of physiological signals need to be considered.1516 More recently, large-language models repurposed to work with multiple data modalities have shown considerable promise. Photorealistic imagery can now be generated from simple text descriptions, the latest AI-powered chatbots can perform complex image analysis, and robots can combine visual input and natural language commands to carry out complex tasks.171819

    The ability to draw correlations between different data sources can accelerate learning and help ground the knowledge encoded by language models in the realities of the physical world.20 These approaches are data-intensive, though and, while the internet is a goldmine of text and images, finding sufficient training material for other modalities could be a barrier.

    Future Horizons:

    ×××

    5-yearhorizon

    The knowledge industry is entirely disrupted

    Multimodal AI works with both text and image data to automate a wide range of tasks in knowledge industry jobs. “Generative” AI can now produce art, images and long form videos indistinguishable from human-made ones, disrupting the creative industries and stoking fears about misinformation. Limited data availability stymies efforts to expand into new modalities, prompting growing focus on using AI models to create synthetic data to train on.

    10-yearhorizon

    AI works with more modalities

    More efficient algorithms and a concerted effort to collect data expand the modalities that AI can work with. This leads to breakthroughs in precision medicine. It also helps AI systems to develop deeper knowledge of the world around them, and a grasp of physical concepts and social dynamics. This allows AI to work more seamlessly and safely alongside humans, boosting the use of the technology in less structured settings like retail, care and education. However, the expanded modalities also create the possibility for emergent characteristics such as in-context learning and episodic memory to develop, opening up a path to artificial general intelligence.

    25-yearhorizon

    AI understands the world through multiple data streams

    What happens on this timescale will have a sensitive dependence on the outcomes of the next few years of AI development. However, we can predict that more general advanced AI systems will use multiple data streams to understand the world around them, in much the same way as humans. These are not limited to the five senses: at 25 years, specialist AI systems use a different set of modalities, depending on their task. Multimodal deep learning also accelerates scientific enquiry by allowing the simultaneous analysis of vastly different kinds of data.

    Multimodal AI - Anticipation Scores

    How the experts see this field in terms of the expected time to maturity, transformational effect across science and industries, current state of awareness among stakeholders and its possible impact on people, society and the planet. See methodology for more information.

    GESDA Best Reads and Key Resources