Cluster learning-assisted directed evolution.

Yuchi Qiu,Guo-Wei Wei,Jian Hu

doi:10.1038/s43588-021-00168-y

Yuchi Qiu, Guo-Wei Wei + Show 1 more

Open Access

https://doi.org/10.1038/s43588-021-00168-y

Copy DOI

Journal: Nature Computational Science	Publication Date: Dec 1, 2021
Citations: 43	License type: cc-by

Affiliation: Michigan State University

Abstract

Directed evolution, a strategy for protein engineering, optimizes protein properties (i.e., fitness) by expensive and time-consuming screening or selection of large mutational sequence space. Machine learning-assisted directed evolution (MLDE), which screens sequence properties in silico, can accelerate the optimization and reduce the experimental burden. This work introduces a MLDE framework, cluster learning-assisted directed evolution (CLADE), that combines hierarchical unsupervised clustering sampling and supervised learning to guide protein engineering. The clustering sampling selectively picks and screens variants in targeted subspaces, which guides the subsequent generation of diverse training sets. In the last stage, accurate predictions via supervised learning models improve final outcomes. By sequentially screening 480 sequences out of 160,000 in a four-site combinatorial library with five equal experimental batches, CLADE achieves the global maximal fitness hit rate up to 91.0% and 34.0% for GB1 and PhoQ datasets, respectively, improved from 18.6% and 7.2% obtained by random-sampling-based MLDE.

Highlights

Directed evolution, a strategy for protein engineering, optimizes protein properties by expensive and timeconsuming screening or selection of a large mutational sequence space
The cluster learning-assisted directed evolution (CLADE) framework is a two-stage procedure consisting of three components: experimental screening, unsupervised clustering and supervised learning
Similar searching approaches that use a hierarchical tree, such as hierarchical optimistic optimization (HOO)[47], deterministic optimistic optimization (DOO) and simultaneous optimistic optimization (SOO)[48], were previously proposed to optimize a smooth black-box function defined on continuum space

Summary

Introduction

A strategy for protein engineering, optimizes protein properties (that is, fitness) by expensive and timeconsuming screening or selection of a large mutational sequence space. Active learning is a popular approach in MLDE, where sequential selections of sequences are decided by the combination of a surrogate model and an acquisition function The former is used to learn the sequence-to-fitness map from labeled data and the latter utilizes the predictions from the surrogate model to prioritize a set of sequences to be screened at the round of experiments[37]. Rather than making use of sequential iterations in experiments, focused training of the MLDE method was proposed to minimize the experimental burden to only two iterations[2] This utilizes unsupervised zero-shot predictors[19,22,40,41] to predict fitness without experiments, and is used to restrict the training set selection within a small informative subset.

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Cluster learning-assisted directed evolution.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Computational Science

Lead the way for us

Similar Papers

Protein engineering in the 21st century.
Roberto A. Chica
Protein science : a publication of the Protein Society | VOL. 24
Roberto A. ChicaRoberto A. Chica
11 Mar 2015
Protein science : a publication of the Protein Society | VOL. 24

Protein Engineering for Improving and Diversifying Natural Product Biosynthesis
Chenyi Li ... Yajun Yan
Trends in Biotechnology | VOL. 38
Chenyi Li, et. al.Chenyi Li ... Yajun Yan
15 Jan 2020
Trends in Biotechnology | VOL. 38

GRAPE, a greedy accumulated strategy for computational protein engineering.
Jinyuan Sun ... Bian Wu
Methods in enzymology | VOL. 648
Jinyuan Sun, et. al.Jinyuan Sun ... Bian Wu
01 Jan 2020
Methods in enzymology | VOL. 648

Protein Engineering Strategies for Tailoring the Physical and Catalytic Properties of Enzymes for Defined Industrial Applications.
Jagdeep Kaur ... Rakesh Kumar
Current protein & peptide science | VOL. 24
Jagdeep Kaur, et. al.Jagdeep Kaur ... Rakesh Kumar
01 Feb 2023
Current protein & peptide science | VOL. 24

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Cluster learning-assisted directed evolution.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Nature Computational Science