Abstract

The functions of most proteins result from their 3D structures, but determining their structures experimentally remains a challenge, despite steady advances in crystallography, NMR and single-particle cryoEM. Computationally predicting the structure of a protein from its primary sequence has long been a grand challenge in bioinformatics, intimately connected with understanding protein chemistry and dynamics. Recent advances in deep learning, combined with the availability of genomic data for inferring co-evolutionary patterns, provide a new approach to protein structure prediction that is complementary to longstanding physics-based approaches. The outstanding performance of AlphaFold2 in the recent Critical Assessment of protein Structure Prediction (CASP14) experiment demonstrates the remarkable power of deep learning in structure prediction. In this perspective, we focus on the key features of AlphaFold2, including its use of (i) attention mechanisms and Transformers to capture long-range dependencies, (ii) symmetry principles to facilitate reasoning over protein structures in three dimensions and (iii) end-to-end differentiability as a unifying framework for learning from protein data. The rules of protein folding are ultimately encoded in the physical principles that underpin it; to conclude, the implications of having a powerful computational model for structure prediction that does not explicitly rely on those principles are discussed.

Highlights

  • Determining the 3D structure of a protein from knowledge of its primary sequence has been a fundamental problem in structural biology since Anfinsen’s classic 1961 refolding experiment, in which it was shown that the folded structure of a protein is encoded in its amino-acid sequence

  • Last December, the organizers of the Fourteenth Critical Assessment of Structure Prediction (CASP14) experiment made the surprising announcement that DeepMind, the London-based and Google-owned artificial intelligence research group, had ‘solved’ the protein-folding problem (The AlphaFold Team, 2020) using their AlphaFold2 algorithm (Jumper et al, 2021)

  • Venki Ramakrishnan, past president of the Royal Society and 2009 Nobel Laureate, concluded that ‘[DeepMind’s] work represents a stunning advance on the protein-folding problem, a 50-year-old grand challenge in biology’ (The AlphaFold Team, 2020)

Read more

Summary

Introduction

Determining the 3D structure of a protein from knowledge of its primary (amino-acid) sequence has been a fundamental problem in structural biology since Anfinsen’s classic 1961 refolding experiment, in which it was shown that the folded structure of a protein is encoded in its amino-acid sequence (with important exceptions; Anfinsen et al, 1961). Given a sufficiently accurate energy model, for example a general solution of the all-atom Schrodinger equation, solving protein folding reduces to simulating dynamical equations of the motion of polypeptides in solution until the lowest free-energy state is reached. A third question focuses on the practical problem of structure prediction and the design of algorithms that can predict the native state from the primary sequence of a protein given all available data, including the structures of related proteins or protein folds, homologous protein sequences and knowledge of polypeptide chemistry and geometry; such protein structure-prediction algorithms often involve physical principles, but recent work with machine-learning algorithms has shown that this need not necessarily be true (AlQuraishi, 2019b). Throughout our discussion, we turn to ideas from physics to ground our intuition

From Go to proteins
Massive search space
Well defined objective function
Large amounts of data
AlphaFold2 at CASP14
Sequence ensembles and evolution-based models
Equivariance and the structure module
End-to-end differentiability and the principle of unification
Findings
Interpretability in machine-learned protein models
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call