Abstract

BackgroundDifferent genome annotation services have been developed in recent years and widely used. However, the functional annotation results from different services are often not the same and a scheme to obtain consensus functional annotations by integrating different results is in demand.ResultsThis article presents a semi-automated scheme that is capable of comparing functional annotations from different sources and consequently obtaining a consensus genome functional annotation result. In this study, we used four automated annotation services to annotate a newly sequenced genome--Arcobacter butzleri ED-1. Our scheme is divided into annotation comparison and annotation determination sections. In the functional annotation comparison section, we employed gene synonym lists to tackle term difference problems. Multiple techniques from information retrieval were used to preprocess the functional annotations. Based on the functional annotation comparison results, we designed a decision tree to obtain a consensus functional annotation result. Experimental results show that our approach can greatly reduce the workload of manual comparison by automatically comparing 87% of the functional annotations. In addition, it automatically determined 87% of the functional annotations, leaving only 13% of the genes for manual curation. We applied this approach across six phylogenetically different genomes in order to assess the performance consistency. The results showed that our scheme is able to automatically perform, on average, 73% and 86% of the annotation comparison and determination tasks, respectively.ConclusionsWe propose a semi-automatic and effective scheme to compare and determine genome functional annotations. It greatly reduces the manual work required in genome functional annotation. As this scheme does not require any specific biological knowledge, it is readily applicable for genome annotation comparison and genome re-annotation projects.

Highlights

  • Different genome annotation services have been developed in recent years and widely used

  • Identical database IDs denote that at least one type of the database IDs is exactly the same, and this rule applies to the following comparison as well; 2) If annotation texts or terms (Pfam terms, TIGRfam terms, Clusters of Orthologous Groups (COG) terms) are identical, we considered them to be the same annotations; 3) If one annotation is an uninformative annotation and the other annotation has functional annotation, we considered them as different annotations; 4) The matching relationship can be transferred

  • To increase the automated comparison rate, we discovered that there are three types of term differences, they are: 1) Text variants; 2) Synonyms and abbreviations; 3) Functional annotation variants

Read more

Summary

Introduction

Different genome annotation services have been developed in recent years and widely used. Craig Venter Institute) annotation service [7] and University of Maryland’s IGS (Institute for Genome Sciences) annotation engine [8] These services can greatly reduce the cost and human efforts needed for annotating genome sequences [3,9,10]. They often generate different results from different annotation methods and it is difficult to compare them and decide which one is more suitable [10,11]. We constructed a baseline method using database ID comparison (EC and gene symbol) and annotation text matching We found that it can compare only 45% of the annotations for Arcobacter butzleri ED-1 (Arc-ED). We adopted a compromising approach by using majority supported annotations as the consensus annotations

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call