In the six years since it was acquired by Google, DeepMind has been rattling through a long list of artificial intelligence milestones. It has outplayed Go champions, bested professional StarCraft players and turned its attention to chess and shogi.
Aside from its work in healthcare – which in September 2019 became part of Google Health – what DeepMind hasn’t been particularly vocal about is applying its AI to more practical problems. There are some exceptions – DeepMind’s AI has already helped to make Google’s data centres more energy efficient and improve the firm’s text-to-speech systems – but most of its headline-grabbing work has focus on using games as proving grounds for AI systems.
But now DeepMind is starting to tackle one of science’s trickiest problems: protein folding. A paper published in the journal Nature details how DeepMind’s AI system was able to beat all of its opponents in a competition where algorithms predicted the structure of a protein based on its genetic makeup. Being able to predict the structure of proteins could make it much easier for us to develop new drugs, understand how genetic mutations lead to disease and develop synthetic proteins.
“They did blow the field apart,” says Paul Bates, leader of the biomolecular modelling laboratory at the Francis Crick Institute in London, and a fellow participant in the Critical Assessment of Techniques for Protein Structure Prediction (CASP) competition. “We were all a bit surprised that they did quite as well as that, since it was their first attempt in the field.”
Although the CASP results were announced in December 2018 (the process of publishing results in a scientific journal can be long and tortuous), DeepMind’s work on protein folding started two years earlier, when some team members started exploring the topic during a two-day hackathon. Part of the attraction, explains DeepMind CEO and co-founder Demis Hassabis, is that protein folding is the kind of problem that AI is uniquely well-placed to help solve.
“The problem seems to be amenable to some kind of human intuition,” he says. “If you think about bending the backbone of the protein, it’s a bit like making a move in a game.”
The protein folding field is also well set up for training artificially intelligent agents. It has a large dataset – the Protein Data Bank, a repository of the 3D structure and genetic makeup of 150,000 proteins, that was used to train DeepMind’s protein structure-predicting system, called AlphaFold. There are also simulators that helped guide whether AlphaFold was correctly predicting the protein structure, as well as a useful test in the form of the CASP competition.
Protein folding is a problem that – if solved – could have a big impact on lots of areas, including drug discovery, disease research and the production of synthetic proteins. “We try and find root node problems – problems where if you solve them it would open up whole avenues of new fields for us and other people to research,” says Hassabis.
The structure of a protein signals both its function, and the drugs or other molecules that are likely to interact with that protein. But right now we only have reliable structures for about half of all the proteins in the human body. That’s why knowing a protein’s precise structure might help researchers develop drugs that very specifically target just that protein – avoiding troubling reactions with other molecules – or make it much easier to work out how a genetic mutation would hinder the functioning of a particular protein.
Understanding protein folding will also help us design our own proteins that could be used as drugs or even to digest waste plastic. “If you can achieve [protein folding prediction] then you can start designing away from nature, then you can design products that can then be more beneficial as drugs that can target proteins more specifically,” says Bates.
We already know how to determine the structure of proteins, but it involves using painstaking techniques such as X-ray crystallography and cryo-electron microscopy. Determining the structure of a single protein can take up the entirety of someone’s PhD research. Being able to predict a protein’s structure computationally could speed up and reduce the cost of research. It might also help researchers design new proteins without having to start with an existing protein. “That’s the benefit that this kind of computational approach can give you,” says Pushmeet Kohli, DeepMind’s head of research for science, robustness and reliability.
But how accurate does a protein-modelling algorithm need to be before it’s useful? Although DeepMind’s AI was better than the competition, for proteins modelled entirely from scratch AlphaFold made the most accurate prediction for 25 out of 43 proteins (its nearest rival managed three), it’s still a long way off being useful in the real world. To be accurate enough for real-world applications, AlphaFold would need to score a global distance score (GDT) – the measure that the CASP test uses to rate accuracy – of between 85 and 90. As of summer 2018, AlphaFold’s overall average GDT was 63.
“It’s still quite a long way to go before we can say we’ve solved [protein folding] in any meaningful way,” says Hassabis. To get there, DeepMind is planning on entering an improved AlphaFold into this year’s CASP test. If it ever gets to the required level of accuracy, then the next challenge will be to turn it into a product that people can use. “We’re still focusing on solving the problem at the moment, and then we’ll figure out how to distribute that,” he says.
“If you think about it, you realise that [DeepMind] should probably do well because they are a massive company, they are experts in machine learning, they know the tools of TensorFlow and they’ve got massive computing resources to call upon,” says Bates. The real test of AlphaFold, he says, will be how it performs in this year’s test. If it manages to improve significantly on its 2018 performance, then solving the problem of protein folding might not quite so far away after all.