Abstract

BackgroundDomain experts manually construct the Structural Classification of Protein (SCOP) database to categorize and compare protein structures. Even though using the SCOP database is believed to be more reliable than classification results from other methods, it is labor intensive. To mimic human classification processes, we develop an automatic SCOP fold classification system to assign possible known SCOP folds and recognize novel folds for newly-discovered proteins.ResultsWith a sufficient amount of ground truth data, our system is able to assign the known folds for newly-discovered proteins in the latest SCOP v1.69 release with 92.17% accuracy. Our system also recognizes the novel folds with 89.27% accuracy using 10 fold cross validation. The average response time for proteins with 500 and 1409 amino acids to complete the classification process is 4.1 and 17.4 seconds, respectively. By comparison with several structural alignment algorithms, our approach outperforms previous methods on both the classification accuracy and efficiency.ConclusionIn this paper, we build an advanced, non-parametric classifier to accelerate the manual classification processes of SCOP. With satisfactory ground truth data from the SCOP database, our approach identifies relevant domain knowledge and yields reasonably accurate classifications. Our system is publicly accessible at .

Highlights

  • Domain experts manually construct the Structural Classification of Protein (SCOP) database to categorize and compare protein structures

  • The Fold Classification based on Structure-Structure Alignment of Proteins (FSSP) database [8] is built based on the Distance Alignment (DALI) [9] algorithm that applies Monte Carlo heuristics to compare structural similarities from 2-D distance matrices mapped from 3-D protein structures

  • 2) Novel SCOP Fold Recognitions: the algorithm detects whether or not newly-discovered protein structures should be categorized into the novel folds

Read more

Summary

Introduction

Domain experts manually construct the Structural Classification of Protein (SCOP) database to categorize and compare protein structures. The Fold Classification based on Structure-Structure Alignment of Proteins (FSSP) database [8] is built based on the Distance Alignment (DALI) [9] algorithm that applies Monte Carlo heuristics to compare structural similarities from 2-D distance matrices mapped from 3-D protein structures. These systems rely on the structural alignment algorithms to measure the similarity of two proteins, which is known to be of complexity NP-Hard [10].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call