A maximum likelihood framework for protein design

Claudia L Kleinman,Nicolas Lartillot,Cécile Bonnard,Nicolas Rodrigue,Hervé Philippe

doi:10.1186/1471-2105-7-326

Abstract

BackgroundThe aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility.ResultsWe propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered.ConclusionAltogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.

Highlights

The aim of protein design is to predict amino-acid sequences compatible with a given target structure
As an alternative to the engineering approach, a more evolutionary stance can be taken towards the inverse folding problem, in which case the aim would rather be to predict the sequences of natural proteins having the conformation of interest
Seen from this new point of view, the design problem raises new questions: natural proteins are the result of a complex evolutionary process, involving an intricate interplay between mutation and selection, and this probably entails many constraints directly related to the native conformation, but not equivalent to the mere requirement of structural stability

Summary

Introduction

The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Envisioned as a purely thermodynamic question, this problem can be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation In this latter perspective, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility. As an alternative to the engineering approach, a more evolutionary stance can be taken towards the inverse folding problem, in which case the aim would rather be to predict the sequences of natural proteins having the conformation of interest Seen from this new point of view, the design problem raises new questions: natural proteins are the result of a complex evolutionary process, involving an intricate interplay between mutation and selection, and this probably entails many constraints directly related to the native conformation, but not equivalent to the mere requirement of structural stability. For this and many other potential reasons, among all sequences predicted by classical engineering-oriented protein design, probably only a subset will look like natural proteins

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Jun 29, 2006
Citations: 85	License type: cc-by

R Discovery Prime

R Discovery Prime

A maximum likelihood framework for protein design

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Identification of computational hot spots in protein interfaces: combining solvent accessibility and inter-residue potentials improves the accuracy
Nurcan Tuncbag ... Ozlem Keskin
Bioinformatics | VOL. 25
Nurcan Tuncbag, et. al.Nurcan Tuncbag ... Ozlem Keskin
08 Apr 2009
Bioinformatics | VOL. 25

Decision letter: Graphical-model framework for automated annotation of cell identities in dense cellular images
Ronald L Calabrese
-
Ronald L CalabreseRonald L Calabrese
24 Aug 2020
24 Aug 2020

Design of protein-binding proteins from the target structure alone
...
Nature | VOL. 605
, et. al. ...
24 Mar 2022
Nature | VOL. 605

Unconstrained Machine Learning Screening for New Li‐Ion Cathode Materials Enhanced by Class Balancing
Filip Dinic ... Oleksandr Voznyy
Advanced Theory and Simulations | VOL. 6
Filip Dinic, et. al.Filip Dinic ... Oleksandr Voznyy
14 Apr 2023
Advanced Theory and Simulations | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A maximum likelihood framework for protein design

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics