Abstract
BackgroundThe functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Evidence for remote sequence similarity can be further strengthened by a similar biological background of the query sequence and identified database sequences. However, few tools exist so far, that provide a means to include functional information in sequence database searches.ResultsWe present ProFAT, a web-based tool for the functional annotation of protein sequences based on remote sequence similarity. ProFAT combines sensitive sequence database search methods and a fold recognition algorithm with a simple text-mining approach. ProFAT extracts identified hits based on their biological background by keyword-mining of annotations, features and most importantly, literature associated with a sequence entry. A user-provided keyword list enables the user to specifically search for weak, but biologically relevant homologues of an input query. The ProFAT server has been evaluated using the complete set of proteins from three different domain families, including their weak relatives and could correctly identify between 90% and 100% of all domain family members studied in this context. ProFAT has furthermore been applied to a variety of proteins from different cellular contexts and we provide evidence on how ProFAT can help in functional prediction of proteins based on remotely conserved proteins.ConclusionBy employing sensitive database search programs as well as exploiting the functional information associated with database sequences, ProFAT can detect remote, but biologically relevant relationships between proteins and will assist researchers in the prediction of protein function based on remote homologies.
Highlights
The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases
We present ProFAT, a web-based tool for the functional annotation of protein sequences based on remote sequence similarity
ProFAT has been applied to a variety of proteins from different cellular contexts and we provide evidence on how ProFAT can help in functional prediction of proteins based on remotely conserved proteins
Summary
The functional annotation of proteins relies on published information concerning their close and remote homologues in sequence databases. Functional prediction relies mostly on the similarity between sequences and standard sequence similarity search tools have been successfully applied in protein functional annotation, provided that the similarity between related proteins is significant enough for sequence-based detection. The similarity between related protein sequences is low, profile-based database search methods like PSI-BLAST or HMMer, as well as fold recognition tools have proven successful in detecting remote homologies and can assist in predicting the function of uncharacterized proteins [1,2]). Characterized proteins have extensive functional information associated with their sequence records. This functional information includes published literature about a protein or gene, functional classifications as for instance provided by the Gene Ontology (GO) annotations, conserved domains that potentially link a protein with a molecular function, and sometimes even a short summary about the proteins' function. Given the complexity of the output of sequence-, as well as structure-based search techniques, the exploitation of this functional knowledge is often tedious and involves extensive manual mining for the biological context of identified database sequences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.