Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Hyebin Song,Emily C Hinds,Garvesh Raskutti,Philip A Romero,Bennett J Bremer

doi:10.1016/j.cels.2020.10.007

Hyebin Song, Emily C Hinds + Show 3 more

Open Access

https://doi.org/10.1016/j.cels.2020.10.007

Copy DOI

Abstract

Machine learning can infer how protein sequence maps to function without requiring a detailed understanding of the underlying physical or biological mechanisms. It is challenging to apply existing supervised learning frameworks to large-scale experimental data generated by deep mutational scanning (DMS) and related methods. DMS data often contain high-dimensional and correlated sequence variables, experimental sampling error and bias, and the presence of missing data. Notably, most DMS data do not contain examples of negative sequences, making it challenging to directly estimate how sequence affects function. Here, we develop a positive-unlabeled (PU) learning framework to infer sequence-function relationships from large-scale DMS data. Our PU learning method displays excellent predictive performance across ten large-scale sequence-function datasets, representing proteins of different folds, functions, and library types. The estimated parameters pinpoint key residues that dictate protein structure and function. Finally, we apply our statistical sequence-function model to design highly stabilized enzymes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Cell systems	Publication Date: Nov 18, 2020
Citations: 41	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Abstract

Talk to us

Similar Papers

More From: Cell systems

Lead the way for us

Similar Papers

High-throughput directed evolution: a golden era for protein science
Romany J Mclure ... David J Brockwell
Trends in Chemistry | VOL. 4
Romany J Mclure, et. al.Romany J Mclure ... David J Brockwell
09 Mar 2022
Trends in Chemistry | VOL. 4

3-Methyladenine-DNA Glycosylase (MPG Protein) Interacts with Human RAD23 Proteins
Feng Miao ... Timothy R O'Connor
Journal of Biological Chemistry | VOL. 275
Feng Miao, et. al.Feng Miao ... Timothy R O'Connor
01 Sep 2000
Journal of Biological Chemistry | VOL. 275

FASTAptamer: A Bioinformatic Toolkit for High-throughput Sequence Analysis of Combinatorial Selections.
Khalid K Alam ... Donald H Burke
Molecular Therapy - Nucleic Acids | VOL. 4
Khalid K Alam, et. al.Khalid K Alam ... Donald H Burke
01 Jan 2015
Molecular Therapy - Nucleic Acids | VOL. 4

Biochemical and Functional Characterization of the Klotho-VS Polymorphism Implicated in Aging and Disease Risk
Tracey B Tucker Zhou ... Carmela R Abraham
Journal of Biological Chemistry | VOL. 288
Tracey B Tucker Zhou, et. al.Tracey B Tucker Zhou ... Carmela R Abraham
01 Dec 2013
Journal of Biological Chemistry | VOL. 288

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.

Abstract

Talk to us

Similar Papers

More From: Cell systems