Abstract

BackgroundDevelopment of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST.Methodology/Principal FindingsWe have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families.Conclusions/SignificanceOur study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.

Highlights

  • Proteins that belong to the same family- exemplified by significant sequence similarity - are evolutionarily related and share similar three dimensional structures and function

  • Conclusions/Significance: Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level

  • Earlier rigorous analysis on serine proteases by Tripathi and Sowdhamini [21] had reported a number of genes encoding Prolyl oligopeptidases (POP), rhomboid, subtilisinlike proteins in genomes of A. thaliana and O. sativa as shown in Table 1 and Table S1

Read more

Summary

Introduction

Proteins that belong to the same family- exemplified by significant sequence similarity - are evolutionarily related and share similar three dimensional structures and function. Two proteins are said to be remote or distant homologues, if the sequence identity among them is poor, owing to evolutionary divergence, but they share common fold and function. Detection of such distant relationships between proteins from sequence information alone, amongst a wide range of unrelated sequences having poor sequence identity, remains a challenging task. As the protein sequence space is very vast and is continuously expanding, as compared to structural space, detecting such distant relationships is still a pivotal task in the field of computational biology. Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. Examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call