Abstract
Comparison of protein structures is important for revealing the evolutionary relationship among proteins, predicting protein functions and predicting protein structures. Many methods have been developed in the past to align two or multiple protein structures. Despite the importance of this problem, rigorous mathematical or statistical frameworks have seldom been pursued for general protein structure comparison. One notable issue in this field is that with many different distances used to measure the similarity between protein structures, none of them are proper distances when protein structures of different sequences are compared. Statistical approaches based on those non-proper distances or similarity scores as random variables are thus not mathematically rigorous. In this work, we develop a mathematical framework for protein structure comparison by treating protein structures as three-dimensional curves. Using an elastic Riemannian metric on spaces of curves, geodesic distance, a proper distance on spaces of curves, can be computed for any two protein structures. In this framework, protein structures can be treated as random variables on the shape manifold, and means and covariance can be computed for populations of protein structures. Furthermore, these moments can be used to build Gaussian-type probability distributions of protein structures for use in hypothesis testing. The covariance of a population of protein structures can reveal the population-specific variations and be helpful in improving structure classification. With curves representing protein structures, the matching is performed using elastic shape analysis of curves, which can effectively model conformational changes and insertions/deletions. We show that our method performs comparably with commonly used methods in protein structure classification on a large manually annotated data set.
Highlights
Comparison of protein structures is an important tool for understanding the evolutionary relationships between proteins, predicting protein structures and predicting protein functions [1,2]
random index (RI) measures the percentage of correct decisions by looking at all pair-wise decisions, which is the ratio ((TP+TN)/ (TP+TN+FP+FN)), where TP is true positive for a pair of proteins, which are in the same class in SCOP and classified into the same class, and TN (True Negative), FP (False Positive), FN (False Negative) are defined
We have developed a mathematical framework for protein structure comparison based on elastic shape analysis, a method originally developed in the field of computer vision and image analysis
Summary
Comparison of protein structures (or structure alignment) is an important tool for understanding the evolutionary relationships between proteins, predicting protein structures and predicting protein functions [1,2]. In annotating functions of new proteins, such as those solved in structural genomics projects, sequence alignment methods may not be sufficient to identify functionally related proteins when the sequence identities between the query protein and its related proteins are low (i.e. lower than 20%) [3]. Despite extensive studies in the past, structure alignment, especially flexible structural alignment (i.e. one of the structures has undergone some conformational changes), continues to be a very challenging problem [37,38,39] Another problem in structure alignment is to assess the statistical significance of the similarity between two protein structures. This problem is partly due to the lack of a proper metric for measuring the distance between two protein structures [40]. They suffer from the same drawback of RMSD as Author Summary
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have