Abstract

BackgroundStatistical approaches for protein design are relevant in the field of molecular evolutionary studies. In recent years, new, so-called structurally constrained (SC) models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. In a previous work, we defined a statistical framework for optimizing knowledge-based potentials especially suited to SC models. Our method used the maximum likelihood principle and provided what we call the joint potentials. However, the method required numerical estimations by the use of computationally heavy Markov Chain Monte Carlo sampling algorithms.ResultsHere, we develop an alternative optimization procedure, based on a leave-one-out argument coupled to fast gradient descent algorithms. We assess that the leave-one-out potential yields very similar results to the joint approach developed previously, both in terms of the resulting potential parameters, and by Bayes factor evaluation in a phylogenetic context. On the other hand, the leave-one-out approach results in a considerable computational benefit (up to a 1,000 fold decrease in computational time for the optimization procedure).ConclusionDue to its computational speed, the optimization method we propose offers an attractive alternative for the design and empirical evaluation of alternative forms of potentials, using large data sets and high-dimensional parameterizations.

Highlights

  • Statistical approaches for protein design are relevant in the field of molecular evolutionary studies

  • We introduced a probabilistic framework for protein design purposes based on the maximum likelihood principle [26]

  • P i=1..n p is the pth native sequence of the dataset, np is the lenght of this sequence and cp is the native conformation associated with s p

Read more

Summary

Introduction

Statistical approaches for protein design are relevant in the field of molecular evolutionary studies. New, so-called structurally constrained (SC) models of protein-coding sequence evolution have been proposed, which use statistical potentials to assess sequence-structure compatibility. By deriving the substitution process from basic principles of population genetics, their aim is to bridge the gap between population genetics and phylogenetics, and to offer a better understanding of the driving forces of the long term evolutionary process. These mutation-selection models propose that (page number not for citation purposes). Depends on the rate of mutation from s to s' The fixation probability pfix(ss') depends on the particular model chosen

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call