Building Blocks of Life – DeepMind to release shape database of every protein known to science

August 1, 2021 |

They are the building blocks of life. Proteins. Can you imagine solving the protein structure prediction problem? Who would have thought a company that started in 2010 with testing AI on 49 different Atari games and being the first AI program to beat a professional Go player, a feat described as a decade ahead of its time, would now be leading the way with solving the protein folding problem? DeepMind will soon release a database of the shape of every protein known to science — more than 100 million. That’s every structured protein in the human body, as well as in 20 research species.

In today’s Digest, what is AlphaFold, what is DeepMind doing that is being touted “one of the most important datasets since the mapping of the Human Genome”, how this AI is changing the world, and more.

Tackling the protein folding problem

It’s a 50 year old challenge, now being solved by DeepMind. There are about 100 million known distinct proteins with more found all the time. Each one unique. But how do we unravel them, how do we look at their structure just from its sequence of amino acids, and tackle the the protein folding problem?

Scientists have long been interested in determining the structures of proteins because a protein’s form is thought to dictate its function. Once a protein’s shape is understood, its role within the cell can be guessed at, and scientists can develop drugs that work with the protein’s unique shape.

Protein folding is a problem I’ve had my eye on for more than 20 years,” DeepMind cofounder and CEO Demis Hassabis told Technology Review. “It’s been a huge project for us. I would say this is the biggest thing we’ve done so far. And it’s the most exciting in a way, because it should have the biggest impact in the world outside of AI.”

All about AlphaFold

DeepMind has been working on this since 2016 and created an AI system known as AlphaFold. It was taught by showing it the sequences and structures of around 100,000 known proteins. Experimental techniques for determining structures are painstakingly laborious and time consuming (sometimes taking years and millions of dollars). But their latest version can now predict the shape of a protein, at scale and in minutes, down to atomic accuracy. This is a significant breakthrough and highlights the impact AI can have on science.

“This will be one of the most important datasets since the mapping of the Human Genome,” said Professor Ewan Birney, EMBL Deputy Director General and EMBL EBI Director.

DeepMind made AlphaFold predictions freely available to anyone in the scientific community.

The AlphaFold Protein Structure Database, created in partnership with Europe’s flagship laboratory for life sciences (EMBL’s European Bioinformatics Institute), builds on decades of painstaking work done by scientists using traditional methods to determine the structure of proteins.

Their first release covers over 350,000 structures, including the human proteome – all of the ~20,000 known proteins expressed in the human body – along with the proteomes of 20 additional organisms important for biological research. Their release dramatically expands our knowledge of protein structures and more than doubles the number of high-accuracy human protein structures available to scientists around the world.

But even better, DeepMind will soon release a database of the shape of every protein known to science — more than 100 million. That’s every structured protein in the human body, as well as in 20 research species, including yeast and E. coli bacteria, fruit flies and mice. Prior to the company’s AlphaFold project, which uses artificial intelligence to predict protein shapes, only 17% of the proteins in the human body had their structures identified, according to Technology Review.

These organisms are central to modern biological research, including Nobel Prize winning discoveries and life-saving drug development. As DeepMind notes on its webpage, “What if solving one problem could unlock solutions to thousands more?”

Being used in real life

AlphaFold is already being used by their partners. For instance, the Drugs for Neglected Diseases Initiative (DNDi) has advanced their research into life-saving cures for diseases that disproportionately affect the poorer parts of the world, and the Centre for Enzyme Innovation at the University of Portsmouth (CEI) is using AlphaFold’s predictions to help engineer faster enzymes for recycling some of our most polluting single-use plastics.

A team at the University of Colorado Boulder is finding promise in using AlphaFold predictions to study antibiotic resistance, while a group at the University of California San Francisco has used them to increase their understanding of SARS-CoV-2 biology.

“What took us months and years to do, AlphaFold was able to do in a weekend,” said Professor John McGeehan, Professor of Structural Biology and Director for the Centre, Centre for Enzyme Innovation at the University of Portsmouth.

Bottom Line

The launching of the AlphaFold Protein Structure Database offers the most complete and accurate picture of the human proteome to date, more than doubling humanity’s accumulated knowledge of high-accuracy human protein structures.

Keep an eye on this as in the coming months, DeepMind plans to vastly expand the AlphaFold Protein Structure Database to almost every sequenced protein known to science. Adding predictions of more than 100 million structures contained in the UniProt reference database, the most comprehensive resource of protein sequences, will create a veritable protein almanac of the world. The possibilities are endless and we can’t wait to see what happens.

Category: Top Stories

Thank you for visting the Digest.