Abstract

The molecular clock and its phylogenetic applications to genomic data have changed how we study and understand one of the major human pathogens, Mycobacterium tuberculosis (MTB), the etiologic agent of tuberculosis. Genome sequences of MTB strains sampled at different times are increasingly used to infer when a particular outbreak begun, when a drug-resistant clone appeared and expanded, or when a strain was introduced into a specific region. Despite the growing importance of the molecular clock in tuberculosis research, there is a lack of consensus as to whether MTB displays a clocklike behavior and about its rate of evolution. Here we performed a systematic study of the molecular clock of MTB on a large genomic data set (6,285 strains), covering different epidemiological settings and most of the known global diversity. We found that sampling times below 15–20 years were often insufficient to calibrate the clock of MTB. For data sets where such calibration was possible, we obtained a clock rate between 1x10-8 and 5x10-7 nucleotide changes per-site-per-year (0.04–2.2 SNPs per-genome-per-year), with substantial differences between clades. These estimates were not strongly dependent on the time of the calibration points as they changed only marginally when we used epidemiological isolates (sampled in the last 40 years) or three ancient DNA samples (about 1,000 years old) to calibrate the tree. Additionally, the uncertainty and the discrepancies in the results of different methods were sometimes large, highlighting the importance of using different methods, and of considering carefully their assumptions and limitations.

Highlights

  • In 1962, Zuckerland and Pauling used the number of amino-acid differences among hemoglobin sequences to infer the divergence time between human and gorilla, in what was the first application of the molecular clock [1]

  • Since the publication of the Mycobacterium tuberculosis (MTB) reference genome [6], whole genome sequence (WGS) data of MTB strains is becoming available at increasing speed, and especially in the last five years, studies using large WGS data sets allowed for precise estimates of the MTB genetic diversity and of the molecular clock rate

  • To test the temporal structure of MTB data sets, we identified 6,285 contemporary strains that passed our quality filters, and for which the date of isolation was known (S1 Table)

Read more

Summary

Introduction

In 1962, Zuckerland and Pauling used the number of amino-acid differences among hemoglobin sequences to infer the divergence time between human and gorilla, in what was the first application of the molecular clock [1]. Thanks to the improvements of sequencing technologies and statistical techniques, it is possible to use sequences sampled at different times to calibrate the molecular clock and study the temporal dimension of evolutionary processes in so called measurably evolving populations [4]. These advancements have been most relevant for ancient DNA (aDNA), and to study the evolutionary dynamics of pathogen populations, including one of the deadliest human pathogens: Mycobacterium tuberculosis. One example of the potential of molecular clock analyses is the study of Eldholm and colleagues [20], where the collapse of the Soviet Union and of its health system was linked to the increased emergence of drug-resistant strains in former Soviet Republics, providing insights into the evolutionary processes promoting drug resistance

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call