Abstract

BackgroundThe rapid development of structural genomics has resulted in many "unknown function" proteins being deposited in Protein Data Bank (PDB), thus, the functional prediction of these proteins has become a challenge for structural bioinformatics. Several sequence-based and structure-based methods have been developed to predict protein function, but these methods need to be improved further, such as, enhancing the accuracy, sensitivity, and the computational speed. Here, an accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and also been compared to other methods.ResultsThe evaluation of CMASA shows that the CMASA is highly accurate (0.96), sensitive (0.86), and fast enough to be used in the large-scale functional annotation. Comparing to both sequence-based and global structure-based methods, not only the CMASA can find remote homologous proteins, but also can find the active site convergence. Comparing to other local structure comparison-based methods, the CMASA can obtain the better performance than both FFF (a method using geometry to predict protein function) and SPASM (a local structure alignment method); and the CMASA is more sensitive than PINTS and is more accurate than JESS (both are local structure alignment methods). The CMASA was applied to annotate the enzyme catalytic sites of the non-redundant PDB, and at least 166 putative catalytic sites have been suggested, these sites can not be observed by the Catalytic Site Atlas (CSA).ConclusionsThe CMASA is an accurate algorithm for detecting local protein structural similarity, and it holds several advantages in predicting enzyme active sites. The CMASA can be used in large-scale enzyme active site annotation. The CMASA can be available by the mail-based server (http://159.226.149.45/other1/CMASA/CMASA.htm).

Highlights

  • The rapid development of structural genomics has resulted in many “unknown function” proteins being deposited in Protein Data Bank (PDB), the functional prediction of these proteins has become a challenge for structural bioinformatics

  • With the development of both the genome project and the structural genomics, large of unknown functional protein structures were deposited in PDB, these protein functions need to be annotated

  • The structure of 2qjw has been deposited by Joint Center for Structural Genomics (JCSG), but its function is unknown yet

Read more

Summary

Introduction

The rapid development of structural genomics has resulted in many “unknown function” proteins being deposited in Protein Data Bank (PDB), the functional prediction of these proteins has become a challenge for structural bioinformatics. An accurate algorithm, the CMASA (Contact MAtrix based local Structural Alignment algorithm), has been developed to predict unknown functions of proteins based on the local protein structural similarity. This algorithm has been evaluated by building a test set including 164 enzyme families, and been compared to other methods. Sequence-based methods, such as, BLAST/PSI-BLAST [1,2] or PROSITE[3], are based on the concept of “similar protein sequences with similar function” The performance of these methods critically depends on the sequence similarity between the query structure and annotated structure. Sequence-based methods may fail to annotate the functional diversified proteins

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call