Abstract

BackgroundAnnotation transfer for function and structure within the sequence homology concept essentially requires protein sequence similarity for the secondary structural blocks forming the fold of a protein. A simplistic similarity approach in the case of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc.) is not justified and a pertinent source for mistaken homologies. The latter is either due to positional sequence conservation as a result of a very simple, physically induced pattern or integral sequence properties that are critical for function. Furthermore, against the backdrop that the number of well-studied proteins continues to grow at a slow rate, it necessitates for a search methodology to dive deeper into the sequence similarity space to connect the unknown sequences to the well-studied ones, albeit more distant, for biological function postulations.ResultsBased on our previous work of dissecting the hidden markov model (HMMER) based similarity score into fold-critical and the non-globular contributions to improve homology inference, we propose a framework-dissectHMMER, that identifies more fold-related domain hits from standard HMMER searches. Subsequent statistical stratification of the fold-related hits into cohorts of functionally-related domains allows for the function postulation of the query sequence. Briefly, the technical problems as to how to recognize non-globular parts in the domain model, resolve contradictory HMMER2/HMMER3 results and evaluate fold-related domain hits for homology, are addressed in this work. The framework is benchmarked against a set of SCOP-to-Pfam domain models. Despite being a sequence-to-profile method, dissectHMMER performs favorably against a profile-to-profile based method-HHsuite/HHsearch. Examples of function annotation using dissectHMMER, including the function discovery of an uncharacterized membrane protein Q9K8K1_BACHD (WP_010899149.1) as a lactose/H+ symporter, are presented. Finally, dissectHMMER webserver is made publicly available at http://dissecthmmer.bii.a-star.edu.sg.ConclusionsThe proposed framework-dissectHMMER, is faithful to the original inception of the sequence homology concept while improving upon the existing HMMER search tool through the rescue of statistically evaluated false-negative yet fold-related domain hits to the query sequence. Overall, this translates into an opportunity for any novel protein sequence to be functionally characterized.ReviewersThis article was reviewed by Masanori Arita, Shamil Sunyaev and L. Aravind.Electronic supplementary materialThe online version of this article (doi:10.1186/s13062-015-0068-3) contains supplementary material, which is available to authorized users.

Highlights

  • Annotation transfer for function and structure within the sequence homology concept essentially requires protein sequence similarity for the secondary structural blocks forming the fold of a protein

  • It must be emphasized that no algorithmic changes are necessary to the original HMMER codes since the main computations in dissectHMMER is done after the alignments have been generated

  • The basis of the sequence homology concept states the necessity to emphasize on the similarity between the structural pieces of an alignment to ensure reasonable fold similarity (3D-structural) and, the implied biological function

Read more

Summary

Introduction

Annotation transfer for function and structure within the sequence homology concept essentially requires protein sequence similarity for the secondary structural blocks forming the fold of a protein. A simplistic similarity approach in the case of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc.) is not justified and a pertinent source for mistaken homologies The latter is either due to positional sequence conservation as a result of a very simple, physically induced pattern or integral sequence properties that are critical for function. The sequence homology concept [1,2,3] is collectively founded upon the inductive reasoning that a homologous protein group (as an antecedent) shares a high level of sequence similarity (as a consequent) [4,5,6,7,8] This refers to a high level of similarity among comparable structural elements across the sequences so that a common structural fold among these homologs is maintained which, in turn, governs the general biological function of this homologous protein family. The only recourse to maintain on the correct search path is to piggyback on the similarity between the structural pieces of the alignment to ensure reasonable fold similarity and, the implied biological function

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call