This has led to rapid improvements in AI language and coding capabilities and is also showing promise in other domains like vision.1011 So far, transformer performance scales reliably with model size, and the largest systems demonstrate emergent capabilities that they were not explicitly trained for, such as creativity and limited reasoning.12 They have also been applied to scientific problems such as protein-folding with considerable success.13 However, it is still possible that the current “scale” approach is learning mainly from a memorisation-based approach, as it its performance is much less impressive in tasks having to do with mathematics and logic. There is a significant need to make trained neural networks perform better on reasoning benchmarks, and new methodologies may be required there
A further concern is that the imperative to build ever larger models means that cutting-edge AI research is increasingly accessible only to well-funded private labs. Because these models are statistical in nature, they also readily learn biases from training data and in some cases confidently “hallucinate” facts that are not true.14 More fundamentally, these models have no memory of their previous actions and their capabilities are baked-in at training, which means they are unable to learn continuously from their interactions. There is significant debate within the field as to whether these capabilities will emerge with greater scale, can be built in explicitly, or whether we will need to move onto new architectures. Part of the problem is that the theory of machine learning is still far behind the practice.