Abstract

Protein remote homology detection is one of the most challenging problems in the field of protein sequence analysis, which is an important step for both theoretical research (such as the understanding of structures and functions of proteins) and drug design. Previous studies have shown that combining different ranking methods via learning to the rank algorithm is an effective strategy for remote protein homology detection, and the performance can be further improved by the protein similarity networks. In this paper, we improved the ProtDec-LTR1.0 and ProtDec-LTR2.0 predictors by incorporating three profile-based features (Top-1-gram, Top-2-gram, and ACC) into the framework of learning to rank via feature mapping strategies. The predictive performance was further refined by the pagerank (PR) algorithm and hyperlink-induced topic search (HITS) algorithm. Finally, a predictor called ProtDec-LTR3.0 was proposed. Rigorous tests on two widely used benchmark datasets showed that the ProtDec-LTR3.0 predictor outperformed both ProtDec-LTR1.0 and ProtDec-LTR2.0, and other nine existing state-of-the-art predictors, indicating that the ProtDec-LTR3.0 is an efficient method for protein remote homology detection, and will become a useful tool for protein sequence analysis. A user-friendly web server of the ProtDec-LTR3.0 predictor was established for the convenience of users, which can be accessed at http://bliulab.net/ProtDec-LTR3.0/.

Highlights

  • The proteins belonging to the same superfamily but different families are remote homology proteins [1]

  • The sequence similarity between remote homologous proteins is usually less than 40%, while homologous proteins usually share less than 95% sequence similarity [1]

  • Can we incorporate the profile-based features into the ranking methods? In order to answer this question, we proposed the feature mapping method to incorporate the profile-based features into the Learning to Ranking algorithm and combine PageRank algorithm and Hyperlink-Induced Topic Search algorithm (HITS) algorithm [29] to further improve the accuracy of protein remote homology detection results, and established a new predictor called ProtDec-LTR3.0, which is an important improved version of ProtDec-LTR1.0 [1] and ProtDec-LTR2.0 [28]

Read more

Summary

INTRODUCTION

The proteins belonging to the same superfamily but different families are remote homology proteins [1]. In this regard, several powerful features have been proposed. In order to answer this question, we proposed the feature mapping method to incorporate the profile-based features into the Learning to Ranking algorithm and combine PageRank algorithm and HITS algorithm [29] to further improve the accuracy of protein remote homology detection results, and established a new predictor called ProtDec-LTR3.0, which is an important improved version of ProtDec-LTR1.0 [1] and ProtDec-LTR2.0 [28]. Learning to Rank algorithm [34] is one of the most powerful machine learning techniques, which has been applied to the field of protein remote homology detection, and showed promising predictive performance [1], [28].

BASIC RANKING METHODS
FEATURE MAPPING STRAEGY
Findings
RESULT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call