Abstract

Homoplasic SNPs are considered important signatures of strong (positive) selective pressure, and hence of adaptive evolution for clinically relevant traits such as antibiotic resistance and virulence. Here we present a new tool, SNPPar, for efficient detection and analysis of homoplasic SNPs from large whole genome sequencing datasets (>1000 isolates and/or >100 000 SNPs). SNPPar takes as input an SNP alignment, tree and annotated reference genome, and uses a combination of simple monophyly tests and ancestral state reconstruction (ASR, via TreeTime) to assign mutation events to branches and identify homoplasies. Mutations are annotated at the level of codon and gene, to facilitate analysis of convergent evolution. Testing on simulated data (120 Mycobacterium tuberculosis alignments representing local and global samples) showed SNPPar can detect homoplasic SNPs with very high specificity (zero false-positives in all tests) and high sensitivity (zero false-negatives in 89 % of tests). SNPPar analysis of three empirically sampled datasets ( Elizabethkingia anophelis , Burkholderia dolosa and M. tuberculosis ) produced results that were in concordance with previous studies, in terms of both individual homoplasies and evidence of convergence at the codon and gene levels. SNPPar analysis of a simulated alignment of ~64 000 genome-wide SNPs from 2000 M. tuberculosis genomes took ~23 min and ~2.6 GB of RAM to generate complete annotated results on a laptop. This analysis required ASR be conducted for only 1.25 % of SNPs, and the ASR step took ~23 s and 0.4 GB of RAM. SNPPar automates the detection and annotation of homoplasic SNPs efficiently and accurately from large SNP alignments. As demonstrated by the examples included here, this information can be readily used to explore the role of homoplasy in parallel and/or convergent evolution at the level of nucleotide, codon and/or gene.

Highlights

  • Bacterial pathogen populations are under strong selection from antimicrobials and host immune defences, and there is increasing interest in utilising whole genome sequencing (WGS) data to detect adaptive evolution in response to these strong selective pressures

  • In order to test the accuracy of SNPPar when performing homoplasic single nucleotide polymorphisms (SNPs) detection, we simulated 120 sets of sequences based on substitution model parameters and phylogenetic trees estimated from real Mycobacterium tuberculosis (Mtb) SNP alignment data[31]

  • Three types of falsenegative homoplasy calls were observed; all involved SNPPar failing to identify a homoplasy in a scenario for which the available data supported a simpler singlemutation explanation

Read more

Summary

Introduction

Bacterial pathogen populations are under strong selection from antimicrobials and host immune defences, and there is increasing interest in utilising whole genome sequencing (WGS) data to detect adaptive evolution in response to these strong selective pressures. A key signature of adaptive evolution is the presence of homoplasies in the population[2,3,4] Homoplasic traits are those that have been gained (or lost) independently in two or more lineages since their divergence from a common ancestor, in contrast to those traits that were gained or lost only once in a population and are shared by virtue of vertical inheritance from a common ancestor[5] (see Figure 1). Extending this to single nucleotide polymorphisms (SNPs), homoplasic SNPs are those where the same derived nucleotide is present in two or more lineages due to independent mutation events that occurred since their divergence from a common ancestor (which harboured a distinct ancestral nucleotide). Under the infinite sites model of molecular evolution[6], the same substitution event should not be observed multiple times in the absence of positive selection, homoplasic SNPs are considered important signatures of adaptive evolution

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call