Abstract

BackgroundIn a previous report (La et al., Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. Across a structurally and functionally heterogeneous dataset, phylogenetic motifs have been demonstrated to correspond to a wide variety of functional site archetypes, including those defined by surface loops, active site clefts, and less exposed regions. However, in our original demonstration of the technique, phylogenetic motif identification is dependent upon a manually determined similarity threshold, prohibiting large-scale application of the technique.ResultsIn this report, we present an algorithmic approach that determines thresholds without human subjectivity. The approach relies on significant raw data preprocessing to improve signal detection. Subsequently, Partition Around Medoids Clustering (PAMC) of the similarity scores assesses sequence fragments where functional annotation remains in question. The accuracy of the approach is confirmed through comparisons to our previous (manual) results and structural analyses. Triosephosphate isomerase and arginyl-tRNA synthetase are discussed as exemplar cases. A quantitative functional site prediction assessment algorithm indicates that the phylogenetic motif predictions, which require sequence information only, are nearly as good as those from evolutionary trace methods that do incorporate structure.ConclusionThe automated threshold detection algorithm has been incorporated into MINER, our web-based phylogenetic motif identification server. MINER is freely available on the web at . Pre-calculated functional site predictions of the COG database and an implementation of the threshold detection algorithm, in the R statistical language, can also be accessed at the website.

Highlights

  • In a previous report (La et al, Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation

  • Contiguous windows with similar values are the result of a single phylogenetic motifs (PMs), making it difficult to properly count the number of true signals in an unsharpened dataset

  • Because PMs are defined as all overlapping windows scoring past the Phylogenetic similarity z-score (PSZ) threshold, reducing the complexity of the problem to be in line with the number of PMs, versus number of windows, makes intuitive sense

Read more

Summary

Introduction

In a previous report (La et al, Proteins, 2005), we have demonstrated that the identification of phylogenetic motifs, protein sequence fragments conserving the overall familial phylogeny, represent a promising approach for sequence/function annotation. In our original demonstration of the technique, phylogenetic motif identification is dependent upon a manually determined similarity threshold, prohibiting large-scale application of the technique. Due to the exponential growth of genomic and protein sequence data, development of automated strategies for large scale functional site identification is an important post-genomic challenge. Sequence regions with significant variability can be functionally important [6], especially when their composition may define sub-family functional specificity. The Evolutionary Trace (ET) procedure [7], and similar approaches [6,8,9], address this problem by using evolutionary information to identify subfamily conserved, yet overall variable, positions. It has been demonstrated that the ET methods identify statistically significant structural clusters [10] and has subsequently been established in large scale analyses [11]

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.