Abstract

Protein remote homology detection is an important task in computational proteomics. Some computational methods have been proposed, which detect remote homology proteins based on different features and algorithms. As noted in previous studies, their predictive results are complementary to each other. Therefore, it is intriguing to explore whether these methods can be combined into one package so as to further enhance the performance power and application convenience. In view of this, we introduced a protein representation called profile-based pseudo protein sequence to extract the evolutionary information from the relevant profiles. Based on the concept of pseudo proteins, a new predictor, called “dRHP-PseRA”, was developed by combining four state-of-the-art predictors (PSI-BLAST, HHblits, Hmmer, and Coma) via the rank aggregation approach. Cross-validation tests on a SCOP benchmark dataset have demonstrated that the new predictor has remarkably outperformed any of the existing methods for the same purpose on ROC50 scores. Accordingly, it is anticipated that dRHP-PseRA holds very high potential to become a useful high throughput tool for detecting remote homology proteins. For the convenience of most experimental scientists, a web-server for dRHP-PseRA has been established at http://bioinformatics.hitsz.edu.cn/dRHP-PseRA/.

Highlights

  • Protein remote homology detection is an important task in computational proteomics

  • The pseudo protein representation can improve the performance of PSI-BLAST, Hmmer, and Coma, as reflecting by both the ROC1 and ROC50 scores

  • Protein remote homology detection is a key technique for studying protein structures and functions

Read more

Summary

Introduction

Protein remote homology detection is an important task in computational proteomics. Some computational methods have been proposed, which detect remote homology proteins based on different features and algorithms. In the post-genomic age, protein sequence database (such as UniProtKB1) has been greatly enriched benefited from the rapid development of sequencing technology, while protein structure and function data in PDB2 is growing relatively much slower Such a gap is increasingly getting enlarged[3]. Protein remote homology detection has been studied for a long time, and many researchers have proposed various approaches to address this task They can be categorized into three groups[4,5,6]: (1) alignment method, (2) discriminative method, and (3) ranking method. PSI-BLAST9 and IMPALA10 are two sequence-alignment methods, while COMPASS11, FFAS12–14, SPARK-X15 and COMA16 are the methods based on profile-profile alignment The latter have achieved much better results

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call