Abstract

Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment. This process, known as bacterial recombination, has a strong impact on the evolution of bacteria, for example, leading to the spread of antibiotic resistance across clades and species, and to the avoidance of clonal interference. Recombination hinders phylogenetic and transmission inference because it creates patterns of substitutions (homoplasies) inconsistent with the hypothesis of a single evolutionary tree. Bacterial recombination is typically modeled as statistically akin to gene conversion in eukaryotes, i.e., using the coalescent with gene conversion (CGC). However, this model can be very computationally demanding as it needs to account for the correlations of evolutionary histories of even distant loci. So, with the increasing popularity of whole genome sequencing, the need has emerged for a faster approach to model and simulate bacterial genome evolution. We present a new model that approximates the coalescent with gene conversion: the bacterial sequential Markov coalescent (BSMC). Our approach is based on a similar idea to the sequential Markov coalescent (SMC)—an approximation of the coalescent with crossover recombination. However, bacterial recombination poses hurdles to a sequential Markov approximation, as it leads to strong correlations and linkage disequilibrium across very distant sites in the genome. Our BSMC overcomes these difficulties, and shows a considerable reduction in computational demand compared to the exact CGC, and very similar patterns in simulated data. We implemented our BSMC model within new simulation software FastSimBac. In addition to the decreased computational demand compared to previous bacterial genome evolution simulators, FastSimBac provides more general options for evolutionary scenarios, allowing population structure with migration, speciation, population size changes, and recombination hotspots. FastSimBac is available from https://bitbucket.org/nicofmay/fastsimbac, and is distributed as open source under the terms of the GNU General Public License. Lastly, we use the BSMC within an Approximate Bayesian Computation (ABC) inference scheme, and suggest that parameters simulated under the exact CGC can correctly be recovered, further showcasing the accuracy of the BSMC. With this ABC we infer recombination rate, mutation rate, and recombination tract length of Bacillus cereus from a whole genome alignment.

Highlights

  • Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment

  • Similar to the exact sequential form of the coalescent with recombination (Wiuf and Hein 1999), the sequential Markov coalescent (SMC) starts by considering one evolutionary tree at the left (i.e., 59) end of the sequence, and generates new trees affected by recombination as it moves toward the right (39) end

  • We ignore recombination events that occurred at distant, previously considered, positions. This approach differs from other approximations to the coalescent with gene conversion (CGC) (e.g., Didelot et al 2010; Ansari and Didelot 2014), as we can simulate entire genomes while allowing recombining lineages with overlapping ancestral material to coalesce with one another, and allowing recombination events to split the ancestral material of recombinant lineages

Read more

Summary

Introduction

Bacteria can exchange and acquire new genetic material from other organisms directly and via the environment. Frequent recombination events can break down ancestral material intervals further and further, reducing them far below the expected length of an individual recombination interval Ignoring these complexities leads to biases when considering elevated recombination rates (Didelot et al 2010), and, by accounting for them, we aim to produce a model more faithful to the CGC. We call this model the bacterial sequential Markov coalescent (BSMC), which we implement within new simulation software called FastSimBac. FastSimBac is faster than previous methods

Objectives
Methods
Results
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call