Abstract

Independent folding units which have the capability of carrying out biological functions have been classified as “protein domains”. These minimal structural units lead not only to considerable sequence changes of protein domains of similar folds and functions, but also gives rise to remarkable length variations under evolutionary pressure. Rapid and heuristic sequence search algorithms are generally sensitive and effective in recognizing protein domains that are distantly related within large sequence databases, but are not well-suited to identify remote homologues of varying lengths. An even more challenging aspect is introduced to distinguish reliable hits from a vast number of putative false positives that could have suboptimal sequence similarities. Here, the authors present a data-mining approach that provides stage-specific filters in sequence searches to reliably accumulate remote homologues, which encourages sampling of length variations albeit with a low false positive rate. Realization of such remote homologues with vivid length variations could contribute to better understanding of functional variety within protein domain superfamilies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call