Abstract

BackgroundMultiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms.Methodology/Principal FindingsWe describe a new method to align two or more genomes that have undergone rearrangements due to recombination and substantial amounts of segmental gain and loss (flux). We demonstrate that the new method can accurately align regions conserved in some, but not all, of the genomes, an important case not handled by our previous work. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score, which facilitates accurate detection of rearrangement breakpoints when genomes have unequal gene content. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The new genome alignment algorithm demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. We apply the new algorithm to a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows us to extend the previously defined concepts of core- and pan-genomes to include not only annotated genes, but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. We document substantial population-level variability among these organisms driven by segmental gain and loss. Interestingly, much variability lies in intergenic regions, suggesting that the Enterobacteriacae may exhibit regulatory divergence.ConclusionsThe multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve.

Highlights

  • Multiple genome alignment is among the most basic tools in the comparative genomics toolbox, its application has been hampered by concerns of accuracy and practicality [1,2,3]

  • The multiple genome alignments generated by our software provide a platform for comparative genomic and population genomic studies

  • Open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve

Read more

Summary

Introduction

Multiple genome alignment is among the most basic tools in the comparative genomics toolbox, its application has been hampered by concerns of accuracy and practicality [1,2,3]. During the course of evolution, genomes undergo both local and large-scale mutational processes. Large-scale mutations can include gain and loss or duplication of large segments, generated by unequal recombination or other processes. Homologous recombination can lead to replacement of whole genes, or even larger segments of the chromosome with non-identical but homologous sequences. Together, these mutational processes cause otherwise identical regions in two or more genomes to be fragmented, reordered, possibly missing, and even to occur in multiple copies. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call