Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution

Trevor S. Frisby,Christopher James Langmead

doi:10.1186/s13015-021-00195-4

Trevor S. Frisby, Christopher James Langmead

Open Access

https://doi.org/10.1186/s13015-021-00195-4

Copy DOI

Journal: Algorithms for Molecular Biology	Publication Date: Jul 1, 2021
Citations: 10	License type: open-access

Affiliation: Carnegie Mellon University

Abstract

BackgroundDirected evolution (DE) is a technique for protein engineering that involves iterative rounds of mutagenesis and screening to search for sequences that optimize a given property, such as binding affinity to a specified target. Unfortunately, the underlying optimization problem is under-determined, and so mutations introduced to improve the specified property may come at the expense of unmeasured, but nevertheless important properties (ex. solubility, thermostability, etc). We address this issue by formulating DE as a regularized Bayesian optimization problem where the regularization term reflects evolutionary or structure-based constraints.ResultsWe applied our approach to DE to three representative proteins, GB1, BRCA1, and SARS-CoV-2 Spike, and evaluated both evolutionary and structure-based regularization terms. The results of these experiments demonstrate that: (i) structure-based regularization usually leads to better designs (and never hurts), compared to the unregularized setting; (ii) evolutionary-based regularization tends to be least effective; and (iii) regularization leads to better designs because it effectively focuses the search in certain areas of sequence space, making better use of the experimental budget. Additionally, like previous work in Machine learning assisted DE, we find that our approach significantly reduces the experimental burden of DE, relative to model-free methods.ConclusionIntroducing regularization into a Bayesian ML-assisted DE framework alters the exploratory patterns of the underlying optimization routine, and can shift variant selections towards those with a range of targeted and desirable properties. In particular, we find that structure-based regularization often improves variant selection compared to unregularized approaches, and never hurts.

Highlights

The field of protein engineering seeks to design molecules with novel or improved properties [1]
Results we report the results of five approaches to performing Directed evolution (DE): (i) single mutation walk; (ii) recombination; (iii) Bayesian optimization using standard acquisition functions (Eqs. 7–9), denoted by ‘Gaussian process (GP) + expected improvement (EI)’, ‘GP + probability of improvement (PI)’, or ‘GP + upper (or lower) confidence bounds (UCB)’; (iv) Bayesian optimization using evolution-based regularized acquisition functions with transformer protein language model (TPLM)-derived log-odds, denoted by ‘GP + EI + TPLM’, ‘GP + PI + TPLM’, or ‘GP + UCB + TPLM’; and (v) Bayesian optimization using structure-based regularized acquisition functions with FoldX-derived G values, denoted by ‘GP + EI + FoldX’, ‘GP + PI + FoldX’, or ‘GP + UCB + FoldX’
In Additional file 1: Fig. S1, we show that evolution-based regularization via gremlin and profile Hidden Markov Model (HMM) are able to improve upon traditional DE techniques on the same Streptococcal protein G B1 domain (GB1) variant selection task

Summary

Introduction

The field of protein engineering seeks to design molecules with novel or improved properties [1]. The primary techniques used in protein engineering fall into two broad categories: rational design [2] and directed evolution (DE) [3]. In contrast, involves iterative rounds of saturation mutagenesis at select residue positions, followed by in vitro or in vivo screening for desirable traits. Directed evolution (DE) is a technique for protein engineering that involves iterative rounds of mutagenesis and screening to search for sequences that optimize a given property, such as binding affinity to a specified target. The primary steps in directed evolution are: (i) random mutagenesis, to create a library of variants; (ii) screening, to identify variants with the desired traits; and (iii) amplification of the best variants, to seed the round. The exploratory aspect of DE is effectively a strategy for getting out of local optima on the underlying fitness landscape

Objectives

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology

Lead the way for us

Similar Papers

Machine learning with knowledge constraints for process optimization of open-air perovskite solar cell manufacturing
Zhe Liu ... Thomas W Colburn
Joule | VOL. 6
Zhe Liu, et. al.Zhe Liu ... Thomas W Colburn
01 Apr 2022
Joule | VOL. 6

Faux-Data Injection Optimization for Accelerating Data-Driven Discovery of Materials
Abdul Wahab Ziaullah ... Fedwa El-Mellouhi
Integrating Materials and Manufacturing Innovation | VOL. 12
Abdul Wahab Ziaullah, et. al.Abdul Wahab Ziaullah ... Fedwa El-Mellouhi
01 Jun 2023
Integrating Materials and Manufacturing Innovation | VOL. 12

An Efficient Batch-Constrained Bayesian Optimization Approach for Analog Circuit Synthesis via Multiobjective Acquisition Ensemble
Shuhan Zhang ... Dian Zhou
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 41
Shuhan Zhang, et. al.Shuhan Zhang ... Dian Zhou
26 Jan 2021
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | VOL. 41

An Efficient Multi-Objective Bayesian Optimization Approach for the Automated Analytical Design of Switched Reluctance Machines
Shen Zhang ... Thomas G Habetler
-
Shen Zhang, et. al.Shen Zhang ... Thomas G Habetler
01 Sep 2018
01 Sep 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bayesian optimization with evolutionary and structure-based regularization for directed protein evolution

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Algorithms for Molecular Biology