Abstract

Clonal populations accumulate mutations over time, resulting in different haplotypes. Deep sequencing of such a population in principle provides information to reconstruct these haplotypes and the frequency at which the haplotypes occur. However, this reconstruction is technically not trivial, especially not in clonal systems with a relatively low mutation frequency. The low number of segregating sites in those systems adds ambiguity to the haplotype phasing and thus obviates the reconstruction of genome-wide haplotypes based on sequence overlap information.Therefore, we present EVORhA, a haplotype reconstruction method that complements phasing information in the non-empty read overlap with the frequency estimations of inferred local haplotypes. As was shown with simulated data, as soon as read lengths and/or mutation rates become restrictive for state-of-the-art methods, the use of this additional frequency information allows EVORhA to still reliably reconstruct genome-wide haplotypes. On real data, we show the applicability of the method in reconstructing the population composition of evolved bacterial populations and in decomposing mixed bacterial infections from clinical samples.

Highlights

  • The genetic heterogeneity of clonal populations is key to their adaptive behavior

  • Our method consist of two steps: a first step comprising a local haplotype reconstruction followed by a window extension, in which haplotypes are defined at the local level, sequencing errors are removed and overlapping regions sharing polymorphisms are extended into longer haplotypes; a second genome-wide reconstruction, during which the final haplotypes and their relative frequencies are inferred by using the frequency observations of the extended haplotypes

  • In this work we present EVORhA, a method for reconstructing haplotypes from deep sequencing data of clonal populations that have a relatively low mutation rate, such as bacteria

Read more

Summary

Introduction

The genetic heterogeneity of clonal populations is key to their adaptive behavior. Environment-specific genes, subject to relaxed selection in a non-inducing environment, build up cryptic variation, that enhances the adaptive potential [1,2]. Even when starting evolution from a single clone (haplotype) under severe selection pressure, the combination of mutation rate and population size appears to be sufficiently high to build up genetic variation in the population [3,4], resulting in a mixture of closely related haplotypes (or quasispecies). Deep sequencing a clonal population in its entirety, referred to as pooled or metagenomic sequencing [9] inherently contains information to determine the haplotypic variation of the population, i.e. the identity of the occurring haplotypes and their frequencies

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call