Abstract

The degree to which evolution is predictable is a fundamental question in biology. Previous attempts to predict the evolution of protein sequences have been limited to specific proteins and to small changes, such as single-residue mutations. Here, we demonstrate that by using a protein language model to predict the local evolution within protein families, we recover a dynamic "vector field" of protein evolution that we call evolutionary velocity (evo-velocity). Evo-velocity generalizes to evolution over vastly different timescales, from viral proteins evolving over years to eukaryotic proteins evolving over geologic eons, and can predict the evolutionary dynamics of proteins that were not used to develop the original model. Evo-velocity also yields new evolutionary insights by predicting strategies of viral-host immune escape, resolving conflicting theories on the evolution of serpins, and revealing a key role of horizontal gene transfer in the evolution of eukaryotic glycolysis.

Highlights

  • A longstanding open question in biology is whether evolution is predictable or fundamentally random (Gould, 1990; La€ssig et al, 2017; Morris, 2003; de Visser and Krug, 2014)

  • We show how the evolutionary predictability enabled by a single, large language model (Box 1) provides a new method for recovering the dynamic trajectories of protein evolution that we refer to as ‘‘evolutionary velocity,’’ or ‘‘evo-velocity.’’ Evo-velocity is conceptually inspired by work in theoretical biology that understands evolution as a path that traverses a ‘‘fitness landscape’’ based on locally optimal decisions (Smith, 1970; Wright, 1932); inspired by traditional fitness landscapes, evo-velocity is a distinct concept, as explained in the first paragraph of the discussion

  • Our key conceptual advance is that by learning the rules underlying local evolution, we can construct a global evolutionary ‘‘vector field’’ that we show can (1) predict the root of observed evolutionary trajectories, (2) order protein sequences in evolutionary time, and (3) identify the mutational strategies that drive these trajectories

Read more

Summary

Introduction

A longstanding open question in biology is whether evolution is predictable or fundamentally random (Gould, 1990; La€ssig et al, 2017; Morris, 2003; de Visser and Krug, 2014). Learning the rules that constrain evolution could enable some amount of evolutionary predictability. Biological complexity (for example, due to the combinatorial complexity of interactions among protein residues) makes learning these rules a considerable challenge (Bloom et al, 2006; Gong et al, 2013; Smith, 1970; Wright, 1932). Promising advances in machine learning have improved the ability of a class of algorithms called language models to learn the rules that govern how amino acids can appear together to form a protein sequence (Alley et al, 2019; Bepler and Berger, 2019, 2021; Hie et al, 2021; Hsu et al, 2022; Rao et al, 2019; Rives et al, 2021; Madani et al, 2021). Language models have been applied only to modeling local evolution, such as single-residue mutations, rather than more complex changes that occur over long evolutionary trajectories.

Objectives
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.