Abstract

BackgroundClassification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures. Among various efforts to improve the database of Structural Classification of Proteins (SCOP), automation has received particular attention. Herein, we predict the deepest SCOP structural level that an unclassified protein shares with classified proteins with an equal number of secondary structure elements (SSEs).ResultsWe compute a coefficient of dissimilarity (Ω) between proteins, based on structural and sequence-based descriptors characterising the respective constituent SSEs. For a set of 1,661 pairs of proteins with sequence identity up to 35%, the performance of Ω in predicting shared Class, Fold and Super-family levels is comparable to that of DaliLite Z score and shows a greater than four-fold increase in the true positive rate (TPR) for proteins sharing the Family level. On a larger set of 600 domains representing 200 families, the performance of Z score improves in predicting a shared Family, but still only achieves about half of the TPR of Ω. The TPR for structures sharing a Super-family is lower than in the first dataset, but Ω performs slightly better than Z score. Overall, the sensitivity of Ω in predicting common Fold level is higher than that of the DaliLite Z score.ConclusionClassification to a deeper level in the hierarchy is specific and difficult. So the efficiency of Ω may be attractive to the curators and the end-users of SCOP. We suggest Ω may be a better measure for structure classification than the DaliLite Z score, with the caveat that currently we are restricted to comparing structures with equal number of SSEs.

Highlights

  • Classification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures

  • Structural Classification of Proteins (SCOP) classification generally depends on the presence of common types of secondary structure elements (SSEs), their topological arrangements and connectivity, structural and functional similarity inferred from a common evolutionary origin and sequential relatedness leading to conserved structural signatures important for protein function [11]

  • Hierarchical structural classification is prone to inconsistencies, even within the same classification scheme, due to the difference in the amount of information available to define various levels and the limited availability of expert knowledge

Read more

Summary

Introduction

Classification of newly resolved protein structures is important in understanding their architectural, evolutionary and functional relatedness to known protein structures. Comparison and classification of newly resolved structures contributes to our understanding of the structural architecture, evolution and function of proteins, especially those with low sequence identity to well characterised proteins [4,5] This information is important for the identification of new protein folds, drug discovery, and phylogenetic analysis of the proteome. SCOP classification generally depends on the presence of common types of SSEs (at the Class level), their topological arrangements and connectivity (at the Fold level), structural and functional similarity inferred from a common evolutionary origin (at the Super-Family level) and sequential relatedness leading to conserved structural signatures important for protein function (at the Family level) [11]. The inclusion of information about SSEs can improve the prediction of protein structural class and fold [12,13,14]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call