Abstract

The classification of the structures of proteins provides preliminary information for the further detailed theoretical analyses. Classified information of protein folds might be utilized for the structural alignment while fold class prediction might help ab inito prediction of protein structures. Here, prediction of structural fold class of proteins with torsion angle based secondary structure profile library and multi-class linear discriminant analysis was performed. All-versus-all method was utilized to circumvent the problem of data imbalance of one-versus-others approach. From nonredundant structure files, a tripeptide secondary structure profile library was constructed and used to calculate the probable secondary structure content of protein folds. The mean and covariance matrices of the reference classes of the training set were derived using this library. Based on this information, fold classes of test set proteins were predicted using multi-class linear discriminant analysis. The result was highly accurate according to the low error rates. This highly accurate fold class prediction might be further utilized in the application of secondary structure predictions exploiting the benefits of larger scrutinizing windows. Appropriateness of the torsion angle representation in local structure analysis has also been partly proved.

Highlights

  • Protein structure is determined by experimental methods including X-ray crystallography and NMR

  • Classified information of protein folds might be utilized for the structural alignments while fold class predictions might help ab inito prediction of protein structures

  • Prediction of the fold class of proteins was performed based on the tripeptide secondary structure profile library which was constructed from non-redundant protein structures

Read more

Summary

Introduction

Protein structure is determined by experimental methods including X-ray crystallography and NMR. There are about 83000 structures in the repository of Protein Data Bank as of August, 2012. This number still lags from the number of revealed protein sequences of more than 1 million. Analysis of experimentally determined structure and theoretical modeling of three-dimensional structure from protein sequence are, fields of strong concern in computational biology. There exist many types of discrimination methods including homology searches and local structure delineations. The classification of the currently known structures of proteins provides preliminary information for the further detailed theoretical analyses. Classified information of protein folds might be utilized for the structural alignments while fold class predictions might help ab inito prediction of protein structures

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call