Abstract

BackgroundRapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes.ResultsWe introduce a new tool, Mugsy-Annotator, that identifies orthologs and evaluates annotation quality in prokaryotic genomes using whole genome multiple alignment. Mugsy-Annotator identifies anomalies in annotated gene structures, including inconsistently located translation initiation sites and disrupted genes due to draft genome sequencing or pseudogenes. An evaluation of species pan-genomes using the tool indicates that such anomalies are common, especially at translation initiation sites. Mugsy-Annotator reports alternate annotations that improve consistency and are candidates for further review.ConclusionsWhole genome multiple alignment can be used to efficiently identify orthologs and annotation problem areas in a bacterial pan-genome. Comparisons of annotated gene structures within a species may show more variation than is actually present in the genome, indicating errors in genome annotation. Our new tool Mugsy-Annotator assists re-annotation efforts by highlighting edits that improve annotation consistency.

Highlights

  • Rapid annotation and comparisons of genomes from multiple isolates is becoming commonplace due to advances in sequencing technology

  • Often a single reference genome is insufficient to describe the genetic diversity of a species, leading to sequencing of many closely related isolates and subsequent comparative analysis

  • We introduce a new tool, Mugsy-Annotator, that uses whole genome multiple alignment for two objectives: 1) identifying orthologs and 2) evaluating the quality of annotated gene structures in prokaryotic genomes

Read more

Summary

Introduction

Rapid annotation and comparisons of genomes from multiple isolates (pan-genomes) is becoming commonplace due to advances in sequencing technology. Genome annotations can contain inconsistencies and errors that hinder comparative analysis even within a single species. Tools are needed to compare and improve annotation quality across sets of closely related genomes. Often a single reference genome is insufficient to describe the genetic diversity of a species, leading to sequencing of many closely related isolates and subsequent comparative analysis. To aid in the analysis, an annotation process is typically performed using computational methods that include prediction of genes and their functions. Limitations of gene prediction include accurate identification of the translation initiation start (TIS) sites and pseudogenes, and over-annotation in GC-rich genomes [5]. Post-processing can be used to identify annotation anomalies, as in GenePrimp [7]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call