Abstract

The classification of protein structures is essential for their function determination in bioinformatics. In this work, we use PDB files to extract descriptors based on structural characteristics of the protein, enriched with the biological features of the primary and secondary structure elements. Then we apply C4.5 algorithm and 10-fold-crossvalidation decision trees to select the most appropriate descriptor features for protein classification based on the SCOP hierarchy. Empirical tests are provided for the usefulness and contribution of the different protein descriptor features to the decision tree classification precision. We propose a novel approach by transforming the hierarchical SCOP classification problem into a bottom — up classification flow. The results show that this approach provides much higher performances (it is much faster) than other algorithms with comparable accuracy: about 80% accuracy for DOMAIN recognition and 84% for SUPERFAMILY recognition.KeywordsProtein ClassificationSCOPDecision TreesC4.5

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call