Protein Design with Deep Learning.

Marianne Defresne,Sophie Barbe,Thomas Schiex

doi:10.3390/ijms222111741

Abstract

Computational Protein Design (CPD) has produced impressive results for engineering new proteins, resulting in a wide variety of applications. In the past few years, various efforts have aimed at replacing or improving existing design methods using Deep Learning technology to leverage the amount of publicly available protein data. Deep Learning (DL) is a very powerful tool to extract patterns from raw data, provided that data are formatted as mathematical objects and the architecture processing them is well suited to the targeted problem. In the case of protein data, specific representations are needed for both the amino acid sequence and the protein structure in order to capture respectively 1D and 3D information. As no consensus has been reached about the most suitable representations, this review describes the representations used so far, discusses their strengths and weaknesses, and details their associated DL architecture for design and related tasks.

Highlights

The wide variety of natural proteins fulfills many different functions, from catalysis to specific recognition, transport, or regulation
The most usual approach to Computational Protein Design (CPD) consists in choosing or de novo constructing a target backbone structure that could carry the function of interest and identify a sequence that will fold onto this backbone and present the expected properties
This formulation is convenient to develop algorithms, but it should be noted that it makes CPD an ill-posed problem: the sequence is optimized for the target structure, but this structure may not be optimal for the sequence which may fold in a different structure [13,14]

Summary

Introduction

The wide variety of natural proteins fulfills many different functions, from catalysis to specific recognition, transport, or regulation. After some background on CPD and Deep Learning, we present the different types of representation that have been used to represent protein data, both sequences and structures, when used for design or related tasks We discuss their strengths and weaknesses, and detail the neural architecture used to process them. This allows to formulate the design problem as an optimization problem: given a input backbone, find a sequence that maximally stabilizes the input backbone (and fulfill the desired function) by minimizing a score function that usually combines the free energy of the resulting protein with other function-related criteria This formulation is convenient to develop algorithms, but it should be noted that it makes CPD an ill-posed problem: the sequence is optimized for the target structure, but this structure may not be optimal for the sequence which may fold in a different structure [13,14]. We focus on the pure sequence design task, aiming at producing a sequence that should either fold in a target backbone or, for some, present a desired function

Evaluation of Design Methods

Background on Deep Learning

Training

Recurrent Architectures

Attention Models

Generative Models

Representation of the Protein Sequence

One-Hot Encoding

Learned Embedding

Position-Specific Scoring Matrices

Representing the Protein Structure

Sequential and Hand-Crafted Representations

Voxel Representation

Distance Maps

Graphs

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: International journal of molecular sciences	Publication Date: Oct 29, 2021
Citations: 27	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Protein Design with Deep Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of molecular sciences

Lead the way for us

Similar Papers

Specificity in Computational Protein Design
James J Havranek
Journal of Biological Chemistry | VOL. 285
James J HavranekJames J Havranek
01 Oct 2010
Journal of Biological Chemistry | VOL. 285

Rosetta FunFolDes - A general framework for the computational design of functional proteins.
Sarel Jacob Fleishman ... Che Yang
PLoS computational biology | VOL. 14
Sarel Jacob Fleishman, et. al.Sarel Jacob Fleishman ... Che Yang
19 Nov 2018
PLoS computational biology | VOL. 14

Protein engineering in the 21st century.
Roberto A Chica
Protein science : a publication of the Protein Society | VOL. 24
Roberto A ChicaRoberto A Chica
11 Mar 2015
Protein science : a publication of the Protein Society | VOL. 24

A Brief History of De Novo Protein Design: Minimal, Rational, and Computational
Derek N Woolfson
Journal of Molecular Biology | VOL. 433
Derek N WoolfsonDerek N Woolfson
21 Jul 2021
Journal of Molecular Biology | VOL. 433

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Protein Design with Deep Learning.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International journal of molecular sciences