Abstract

BackgroundThe expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways.Resultsannot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools.Conclusionannot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.

Highlights

  • The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available

  • The situation is different for non-model species where often the core of available sequence data comes from expressed sequence tags (ESTs)

  • We have developed annot8r, a software tool that facilitates the annotation of new sequences with Gene Ontology (GO) terms, Enzyme commission (EC) numbers and Kyoto Encyclopedia for Genes and Genomes (KEGG) pathways based on similarity searches against annotated subsets of the EMBL UniProt database [6]. annot8r is a generic tool that can be used for automated annotation of any protein sequences, but it has been written predominantly for the annotation of EST datasets

Read more

Summary

Background

Protein sequences from model organisms are generally well annotated. The situation is different for non-model species where often the core of available sequence data comes from expressed sequence tags (ESTs). BMC Bioinformatics 2008, 9:180 http://www.biomedcentral.com/1471-2105/9/180 need for user-friendly and easy-to-use tools to assist in the functional annotation of sequences for non-model organisms on this scale Annotation for such projects has been based on the descriptor of the best BLAST hit. The KEGG consortium provides a complete set of UniProt proteins that have attached a KEGG orthology category All information from these files relevant for the annotation process is read into a postgreSQL reference database for efficient look-up. Similarity searches and generation of the results database To start the BLAST searches against each of these three UniProt subsets the user has to provide the sequences to be annotated as an input file in multi-FASTA format. Detailed examples illustrating this are given in the tutorial part of the user guide

Results and Discussion
Conclusion
The UniProt Consortium
10. Blaxter ML
16. Johnston IA

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.