Abstract

BackgroundProtein structure comparison is a central issue in structural bioinformatics. The standard dissimilarity measure for protein structures is the root mean square deviation (RMSD) of representative atom positions such as α-carbons. To evaluate the RMSD the structures under comparison must be superimposed optimally so as to minimize the RMSD. How to evaluate optimal fits becomes a matter of debate, if the structures contain regions which differ largely - a situation encountered in NMR ensembles and proteins undergoing large-scale conformational transitions.ResultsWe present a probabilistic method for robust superposition and comparison of protein structures. Our method aims to identify the largest structurally invariant core. To do so, we model non-rigid displacements in protein structures with outlier-tolerant probability distributions. These distributions exhibit heavier tails than the Gaussian distribution underlying standard RMSD minimization and thus accommodate highly divergent structural regions. The drawback is that under a heavy-tailed model analytical expressions for the optimal superposition no longer exist. To circumvent this problem we work with a scale mixture representation, which implies a weighted RMSD. We develop two iterative procedures, an Expectation Maximization algorithm and a Gibbs sampler, to estimate the local weights, the optimal superposition, and the parameters of the heavy-tailed distribution. Applications demonstrate that heavy-tailed models capture differences between structures undergoing substantial conformational changes and can be used to assess the precision of NMR structures. By comparing Bayes factors we can automatically choose the most adequate model. Therefore our method is parameter-free.ConclusionsHeavy-tailed distributions are well-suited to describe large-scale conformational differences in protein structures. A scale mixture representation facilitates the fitting of these distributions and enables outlier-tolerant superposition.

Highlights

  • Protein structure comparison is a central issue in structural bioinformatics

  • We show that learning these models solves the superposition problem in the presence of gross structural transitions and is equivalent to a weighted root mean square deviation (RMSD) whose weights are updated iteratively

  • Frames of reference in protein structure comparison Rather than adopting a coordinate system prescribed by physical principles such as the Eckart frame, we are interested in finding a frame of reference that overlays heterogenous protein structures such that they share a tight, maximally large structural core at the cost of large-scale outliers

Read more

Summary

Introduction

Protein structure comparison is a central issue in structural bioinformatics. The standard dissimilarity measure for protein structures is the root mean square deviation (RMSD) of representative atom positions such as αcarbons. To evaluate the RMSD the structures under comparison must be superimposed optimally so as to minimize the RMSD. How to evaluate optimal fits becomes a matter of debate, if the structures contain regions which differ largely - a situation encountered in NMR ensembles and proteins undergoing large-scale conformational transitions. Conformational heterogeneity is a common theme in protein structures and relevant in a wide range of different contexts. Proteins are flexible macromolecules whose function is often accompanied by a structural transition [1,2]. Structural heterogeneity may stem from a lack of data. NMR structures are usually represented as ensembles of conformers that fit the data well [5]. Structural heterogeneity mainly reflects a scarcity of restraints and not necessarily true conformational flexibility

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.