Abstract

Protein structure classification hierarchically clusters domain structures based on structure and/or sequence similarities and plays important roles in the study of protein structure-function relationship and protein evolution. Among many classifications, SCOP and CATH are widely viewed as the gold standards. Fold classification is of special interest because this is the lowest level of classification that does not depend on protein sequence similarity. The current fold classifications such as those in SCOP and CATH are controversial because they implicitly assume that folds are discrete islands in the structure space, whereas increasing evidence suggests significant similarities among folds and supports a continuous fold space. Although this problem is widely recognized, its impact on fold classification has not been quantitatively evaluated. Here we develop a likelihood method to classify a domain into the existing folds of CATH or SCOP using both query-fold structure similarities and within-fold structure heterogeneities. The new classification differs from the original classification for 3.4–12% of domains, depending on factors such as the structure similarity score and original classification scheme used. Because these factors differ for different biological purposes, our results indicate that the importance of considering structure space continuity in fold classification depends on the specific question asked.

Highlights

  • Since the 1970s, classification of protein domain structures has gained wide popularity because of its utility in predicting protein function and studying protein evolution[1]

  • The current fold classification in SCOP and CATH implicitly assumes that different folds represent isolated islands in the structure space

  • With the explosion of the number of solved domain structures and the use of structure similarity metrics, increasing evidence supports the concept of a continuous fold space where domains from different folds have significant structural similarities[12,13,14,15]

Read more

Summary

Introduction

Since the 1970s, classification of protein domain structures has gained wide popularity because of its utility in predicting protein function and studying protein evolution[1]. The current fold classification in SCOP and CATH implicitly assumes that different folds represent isolated islands in the structure space. With the explosion of the number of solved domain structures and the use of structure similarity metrics, increasing evidence supports the concept of a continuous fold space where domains from different folds have significant structural similarities[12,13,14,15] This discovery prompted multiple authors to question the current fold hierarchy[14] and propose alternative representations such as structure similarity networks[16] and maps[17,18]. It is unclear to what degree fold space continuity affects protein structure classification and whether it is legitimate to ignore this continuity in classification To answer these questions, we propose and implement a strategy to classify domain structures to existing folds by considering fold space continuity. By comparing our new classification with the current CATH and SCOP classifications, we assess the importance of considering the fold space continuity in fold classification

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call