Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space

Marshall Case,Matthew Smith,Jordan Vinh,Greg Thurber

doi:10.1073/pnas.2311726121

Abstract

Proteins are a diverse class of biomolecules responsible for wide-ranging cellular functions, from catalyzing reactions to recognizing pathogens. The ability to evolve proteins rapidly and inexpensively toward improved properties is a common objective for protein engineers. Powerful high-throughput methods like fluorescent activated cell sorting and next-generation sequencing have dramatically improved directed evolution experiments. However, it is unclear how to best leverage these data to characterize protein fitness landscapes more completely and identify lead candidates. In this work, we develop a simple yet powerful framework to improve protein optimization by predicting continuous protein properties from simple directed evolution experiments using interpretable, linear machine learning models. Importantly, we find that these models, which use data from simple but imprecise experimental estimates of protein fitness, have predictive capabilities that approach more precise but expensive data. Evaluated across five diverse protein engineering tasks, continuous properties are consistently predicted from readily available deep sequencing data, demonstrating that protein fitness space can be reasonably well modeled by linear relationships among sequence mutations. To prospectively test the utility of this approach, we generated a library of stapled peptides and applied the framework to predict affinity and specificity from simple cell sorting data. We then coupled integer linear programming, a method to optimize protein fitness from linear weights, with mutation scores from machine learning to identify variants in unseen sequence space that have improved and co-optimal properties. This approach represents a versatile tool for improved analysis and identification of protein variants across many domains of protein engineering.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences of the United States of America	Publication Date: Mar 7, 2024
Citations: 1	License type: CC BY-NC-ND 4.0

R Discovery Prime

R Discovery Prime

Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America

Lead the way for us

Similar Papers

Asynchrony Between Individual and Government Actions Accounts for Disproportionate Impact of COVID-19 on Vulnerable Communities
Moustafa Abdalla ... Mohamed Saad
American journal of preventive medicine | VOL. 60
Moustafa Abdalla, et. al.Moustafa Abdalla ... Mohamed Saad
13 Nov 2020
American journal of preventive medicine | VOL. 60

Minimal Residual Disease (MRD) Assessment in the ECOG1411 Randomized Phase 2 Trial of Front-Line Bendamustine-Rituximab (BR)-Based Induction Followed By Rituximab (R) ± Lenalidomide (L) Consolidation for Mantle Cell Lymphoma (MCL)
Mitchell Smith ... Thomas E Witzig
Blood | VOL. 134
Mitchell Smith, et. al.Mitchell Smith ... Thomas E Witzig
13 Nov 2019
Blood | VOL. 134

Why it is Unfortunate that "Faster," "Better" and "Less Biased" Linear Machine Learning Models "Work" so well in Electromechanical Switching of Ferroelectric Thin Films
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
01 Mar 2022
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Why it is Unfortunate that "Faster," "Better" and "Less Biased" Linear Machine Learning Models "Work" so well in Electromechanical Switching of Ferroelectric Thin Films
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
15 Feb 2022
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning to predict continuous protein properties from binary cell sorting data and map unseen sequence space

Abstract

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences of the United States of America