Abstract
Genome annotations are accumulating rapidly and depend heavily on automated annotation systems. Many genome centers offer annotation systems but no one has compared their output in a systematic way to determine accuracy and inherent errors. Errors in the annotations are routinely deposited in databases such as NCBI and used to validate subsequent annotation errors. We submitted the genome sequence of halophilic archaeon Halorhabdus utahensis to be analyzed by three genome annotation services. We have examined the output from each service in a variety of ways in order to compare the methodology and effectiveness of the annotations, as well as to explore the genes, pathways, and physiology of the previously unannotated genome. The annotation services differ considerably in gene calls, features, and ease of use. We had to manually identify the origin of replication and the species-specific consensus ribosome-binding site. Additionally, we conducted laboratory experiments to test H. utahensis growth and enzyme activity. Current annotation practices need to improve in order to more accurately reflect a genome's biological potential. We make specific recommendations that could improve the quality of microbial annotation projects.
Highlights
The field of genomics has become increasingly important in the world of science
Intron-containing tRNA genes In reviewing the tRNA gene calls made by each annotation service, we found that Integrated Microbial Genome (IMG) and Rapid Annotation using Subsystems Technology (RAST) called the same 45 tRNAs, while J. Craig Venter Institute (JCVI) called 44
JCVI used BLAST and Rfam to locate rRNAs, whereas IMG used an IMG RNA database and RAST used a script by Niels Larsen [16]
Summary
The field of genomics has become increasingly important in the world of science. The ability to collect and analyze genomic data provides great potential for the study of life, and is especially useful with multiple organisms living in one community and with organisms that cannot be grown in culture [1]. Cost-effective sequencing methods and tools have surpassed manual annotation as the amount of input data has increased by orders of magnitude. In order to benefit from the power of genomic sequencing, the annotation tools must be reliable and the databases must be consistent. Hundreds of genomes will be submitted to be sequenced and annotated [3]. Every time a particular annotation service repeats a systematic error, the results are deposited into a database. As new annotations are produced by the same service, previously deposited errors are used to validate the newest annotation, which contains the same systematic errors. Systematic errors are used to validate repetition of the same errors, and the databases accumulate incorrect annotations that are particular to each annotation service
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.