Abstract

Machine learning (ML) is revolutionizing our ability to understand and predict the complex relationships between protein sequence, structure, and function. Predictive sequence-function models are enabling protein engineers to efficiently search the sequence space for useful proteins with broad applications in biotechnology. In this review, we highlight the recent advances in applying ML to protein engineering. We discuss supervised learning methods that infer the sequence-function mapping from experimental data and new sequence representation strategies for data-efficient modeling. We then describe the various ways in which ML can be incorporated into protein engineering workflows, including purely in silico searches, ML-assisted directed evolution, and generative models that can learn the underlying distribution of the protein function in a sequence space. ML-driven protein engineering will become increasingly powerful with continued advances in high-throughput data generation, data science, and deep learning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call