Abstract

BackgroundPredicting protein function has become increasingly demanding in the era of next generation sequencing technology. The task to assign a curator-reviewed function to every single sequence is impracticable. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic scale, are necessary and urgent. In this scenario, the Gene Ontology has provided the means to standardize the annotation classification with a structured vocabulary which can be easily exploited by computational methods.ResultsArgot2 is a web-based function prediction tool able to annotate nucleic or protein sequences from small datasets up to entire genomes. It accepts as input a list of sequences in FASTA format, which are processed using BLAST and HMMER searches vs UniProKB and Pfam databases respectively; these sequences are then annotated with GO terms retrieved from the UniProtKB-GOA database and the terms are weighted using the e-values from BLAST and HMMER. The weighted GO terms are processed according to both their semantic similarity relations described by the Gene Ontology and their associated score. The algorithm is based on the original idea developed in a previous tool called Argot. The entire engine has been completely rewritten to improve both accuracy and computational efficiency, thus allowing for the annotation of complete genomes.ConclusionsThe revised algorithm has been already employed and successfully tested during in-house genome projects of grape and apple, and has proven to have a high precision and recall in all our benchmark conditions. It has also been successfully compared with Blast2GO, one of the methods most commonly employed for sequence annotation. The server is freely accessible at http://www.medcomp.medicina.unipd.it/Argot2.

Highlights

  • Predicting protein function has become increasingly demanding in the era of generation sequencing technology

  • We present Argot2 (Annotation Retrieval of Gene Ontology Terms), a tool designed for highthroughput annotation of large sequence data sets with high efficiency and precision

  • Assuming that the set V is ordered, it is possible to establish a one-to-one correspondence between the ith Gene Ontology (GO) term gi Î V used for the annotation, its weight wi and the e-value scores Si and Si’ given by BLAST and HMMER

Read more

Summary

Introduction

Predicting protein function has become increasingly demanding in the era of generation sequencing technology. Bioinformatics tools, easy to use and able to provide automatic and reliable annotations at a genomic scale, are necessary and urgent. In this scenario, the Gene Ontology has provided the means to standardize the annotation classification with a structured vocabulary which can be exploited by computational methods. The task to assign a curator-reviewed function to every single sequence is unworkable, calling for efficient/effective methods to assign automatic annotation are necessary as a first analysis step to support working hypotheses. In the category of sequence-based methods, the simple search for homologous sequences is considered a common practice for function prediction based on annotation transfer, and BLAST [6] can be considered a gold standard. If its classical pairwise alignment engine fails, the profile based PSI-BLAST [6] is able to identify relationship among distantly related proteins

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call