But progress is never certain. Every technology has the potential for harm, which is why scientists and society need to take short and long term risks seriously to ensure that AI fulfils its potential to advance science and benefit humanity. It is crucial that we all engage with the promises and perils that accompany this new era in scientific discovery.
At DeepMind, we use AI to solve fundamental scientific problems that can help unlock further research and benefit society. This type of foundational impact is exemplified by recent breakthroughs in protein folding. Proteins are the building blocks of life and their structure is fundamental to their function. Many of the world's greatest challenges, like developing treatments for diseases or finding enzymes that break down industrial waste, are fundamentally tied to proteins and the role they play.
However, connecting their chemical makeup — their unique sequence of amino acids — to their three-dimensional shape and ensuing biological function is complicated and time-consuming, sometimes taking years and millions of dollars. One of biology's longstanding “grand challenges”, unsolved for nearly 50 years and known as the protein folding problem, was to find a shortcut that predicts a protein's 3D structure from nothing more than its sequence of amino acids.
In 2020, DeepMind's AI system AlphaFold was recognised as a solution to this challenge by the organisers of the Critical Assessment of protein Structure Prediction (CASP) competition. By training AlphaFold on a large database of known protein structures and their associated amino acid sequences, our team developed a model that could predict the shape of a protein, at scale and in minutes, down to atomic accuracy. AlphaFold predictions have already been accessed by more than half a million researchers and used to accelerate progress on important real-world problems ranging from plastic pollution to antibiotic resistance. In partnership with EMBL's European Bioinformatics Institute (EMBL-EBI), we've now made over 200 million of these predicted structures — nearly all catalogued proteins known to science — freely available to the global scientific community, with the potential to increase humanity's understanding of biology by orders of magnitude.
We believe that protein folding is indicative of where AI can have one of the biggest impacts: decoding complex biological systems to advance our understanding of the world around us. In a glimpse of the AI-powered discoveries to come, our team has already used deep learning to predict how non-coding parts of our genome influence which of our genes get switched on and off. This could dramatically improve our understanding of how genotype influences phenotype, with major implications in research and medicine.
We have also seen success in applying AI to quantum chemistry, where it has enhanced our ability to predict how electrons will behave in molecules. This might seem esoteric, but solving this puzzle is the first step towards being able to design novel materials from the bottom up, opening the door to breakthroughs such as new high-temperature superconductors and designed-from-scratch pharmaceuticals. In mathematics, we have seen how machine learning can help mathematicians make progress in answering foundational questions. Collaborating with AI systems that expose patterns that humans have been unable to spot, researchers have been able to develop entirely new mathematical conjectures about symmetries and knots.
The next step in the relationship between AI and science will be to better connect machine learning techniques with scientific processes. The AlphaFold breakthrough came from applying machine learning to pre-existing datasets that had not been collected specifically for that purpose. In some sense, the in vitro and in silico parts of the research roadmap were separated. We are beginning to see those barriers break down, and in a wide range of disciplines, AI will become an integral part of the scientist's toolkit. Computational techniques will then be central to how science is conducted, and learning how to use them effectively will be a vital skill for anyone working in the field.
AI is many things, but it is not a panacea. It is critical that we also educate the next generation of researchers about the limitations of these tools. When we talk about artificial intelligence we have to understand where that intelligence comes from. Today's machine learning systems develop intelligence through experience. This experience either comes from training data collected by humans or through experimentation in simulators. If the data is poor quality, or the simulator does not accurately describe the system it is meant to represent, then the AI will be ineffective at achieving its goals. There is a well-worn but important phrase in computer science: “garbage in, garbage out”. Researchers must remain aware that AI cannot be thrown at a scientific problem with the expectation of a solution. We must understand that how we apply these tools matters as much as where we apply them.
Scientists will also need to get to grips with the challenge of "problem specification", which refers to deciding exactly what problem that they want AI to solve for them. To understand why, consider the design of antibiotics. Set a machine learning system the goal of finding a drug that efficiently kills bacteria, and it is likely to come up with something that would also wipe out a patient's microbiome — their body's health-enhancing bacteria. Instead, the AI must be tasked with optimising its search to find a drug that selectively targets specific pathogens.
And while AI can dramatically accelerate progress in research, it is not a substitute for scientists themselves. Researchers that use AI cannot afford to rely blindly on it — especially given that many machine learning systems remain black boxes whose inner workings can be difficult to decipher. Researchers should be aware of the limitations or biases of any AI tools they work with.