Neural networks to learn protein sequence–function relationships from deep mutational scanning data

Sam Gelman,Sarah A Fahlberg,Pete Heinzelman,Philip A Romero,Anthony Gitter

doi:10.1073/pnas.2104878118

Sam Gelman, Sarah A Fahlberg + Show 3 more

Open Access

https://doi.org/10.1073/pnas.2104878118

Copy DOI

Abstract

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein's behavior and properties. We present a supervised deep learning framework to learn the sequence-function mapping from deep mutational scanning data and make predictions for new, uncharacterized sequence variants. We test multiple neural network architectures, including a graph convolutional network that incorporates protein structure, to explore how a network's internal representation affects its ability to learn the sequence-function mapping. Our supervised learning approach displays superior performance over physics-based and unsupervised prediction methods. We find that networks that capture nonlinear interactions and share parameters across sequence positions are important for learning the relationship between sequence and function. Further analysis of the trained models reveals the networks' ability to learn biologically meaningful information about protein structure and mechanism. Finally, we demonstrate the models' ability to navigate sequence space and design new proteins beyond the training set. We applied the proteinG B1 domain (GB1) models to design a sequence that binds to immunoglobulin G with substantially higher affinity than wild-type GB1.

Highlights

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties
We develop a deep learning framework to learn from large-scale sequence–function data generated by deep mutational scanning
We evaluated the predictive performance of the different network architectures on five diverse deep mutational scanning datasets representing proteins of varying sizes, folds, and functions: Aequorea victoria green fluorescent protein, β-glucosidase (Bgl3), G B1 domain (GB1), poly(A)-binding protein (Pab1), and ubiquitination factor E4B (Ube4b) (Fig. 2A and Table 1)

Summary

Introduction

The mapping from protein sequence to function is highly complex, making it challenging to predict how sequence changes will affect a protein’s behavior and properties. Protein engineering | deep learning | convolutional neural network sequence–function datasets to predict specific molecular phenotypes with the high accuracy required for protein design. Understanding the mapping from protein sequence to function is important for describing natural evolutionary processes, diagnosing genetic disease, and designing new proteins with useful properties This mapping is shaped by thousands of intricate molecular interactions, dynamic conformational ensembles, and nonlinear relationships between biophysical properties. The volume of protein data has exploded over the last decade with advances in DNA sequencing, three-dimensional structure determination, and high-throughput screening With these increasing data, statistics and machine learning approaches have emerged as powerful methods to understand the complex mapping from protein sequence to function. There is a current need for general, easy to use supervised learning methods that can leverage large

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Proceedings of the National Academy of Sciences	Publication Date: Nov 23, 2021
Citations: 94	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Neural networks to learn protein sequence–function relationships from deep mutational scanning data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences

Lead the way for us

Similar Papers

Inferring Protein Sequence-Function Relationships with Large-Scale Positive-Unlabeled Learning.
Hyebin Song ... Bennett J Bremer
Cell systems | VOL. 12
Hyebin Song, et. al.Hyebin Song ... Bennett J Bremer
18 Nov 2020
Cell systems | VOL. 12

Cross-protein transfer learning substantially improves disease variant prediction
Milind Jagota ... Yun S Song
Genome Biology | VOL. 24
Milind Jagota, et. al.Milind Jagota ... Yun S Song
07 Aug 2023
Genome Biology | VOL. 24

Flattening the curve-How to get better results with small deep-mutational-scanning datasets.
Gregor Wirnsberger ... Karl Gruber
Proteins | VOL. 92
Gregor Wirnsberger, et. al.Gregor Wirnsberger ... Karl Gruber
19 Mar 2024
Proteins | VOL. 92

Dms-view: Interactive visualization tool for deep mutational scanning data.
Sarah Hilton ... Adam Dingens
Journal of open source software | VOL. 5
Sarah Hilton, et. al.Sarah Hilton ... Adam Dingens
17 Aug 2020
Journal of open source software | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Neural networks to learn protein sequence–function relationships from deep mutational scanning data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Proceedings of the National Academy of Sciences