Abstract

Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome's biological potential. We make specific recommendations that could improve the quality of microbial annotation projects.

Highlights

  • The field of genomics has become increasingly important in the world of science

  • Intron-containing tRNA genes In reviewing the tRNA gene calls made by each annotation service, we found that Integrated Microbial Genome (IMG) and Rapid Annotation using Subsystems Technology (RAST) called the same 45 tRNAs, while J. Craig Venter Institute (JCVI) called 44

  • JCVI used BLAST and Rfam to locate rRNAs, whereas IMG used an IMG RNA database and RAST used a script by Niels Larsen [16]

Read more

Summary

Introduction

The field of genomics has become increasingly important in the world of science. The ability to collect and analyze genomic data provides great potential for the study of life, and is especially useful with multiple organisms living in one community and with organisms that cannot be grown in culture [1]. Cost-effective sequencing methods and tools have surpassed manual annotation as the amount of input data has increased by orders of magnitude. In order to benefit from the power of genomic sequencing, the annotation tools must be reliable and the databases must be consistent. Hundreds of genomes will be submitted to be sequenced and annotated [3]. Every time a particular annotation service repeats a systematic error, the results are deposited into a database. As new annotations are produced by the same service, previously deposited errors are used to validate the newest annotation, which contains the same systematic errors. Systematic errors are used to validate repetition of the same errors, and the databases accumulate incorrect annotations that are particular to each annotation service

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call