Abstract

Tuberculosis disease is a major global public health concern and the growing prevalence of drug-resistant Mycobacterium tuberculosis is making disease control more difficult. However, the increasing application of whole-genome sequencing as a diagnostic tool is leading to the profiling of drug resistance to inform clinical practice and treatment decision making. Computational approaches for identifying established and novel resistance-conferring mutations in genomic data include genome-wide association study (GWAS) methodologies, tests for convergent evolution and machine learning techniques. These methods may be confounded by extensive co-occurrent resistance, where statistical models for a drug include unrelated mutations known to be causing resistance to other drugs. Here, we introduce a novel ‘cannibalistic’ elimination algorithm (“Hungry, Hungry SNPos”) that attempts to remove these co-occurrent resistant variants. Using an M. tuberculosis genomic dataset for the virulent Beijing strain-type (n = 3,574) with phenotypic resistance data across five drugs (isoniazid, rifampicin, ethambutol, pyrazinamide, and streptomycin), we demonstrate that this new approach is considerably more robust than traditional methods and detects resistance-associated variants too rare to be likely picked up by correlation-based techniques like GWAS.

Highlights

  • Tuberculosis disease (TB), caused by bacteria in the Mycobacterium tuberculosis (Mtb) complex, is a major global public health burden

  • To check for systematic biases within the set of missense single nucleotide polymorphisms (SNPs), the distributions of variants, non-calls and allele-counts were compared with the original dataset holding all SNPs and no striking differences were found (S3 and S4 Figs)

  • There is a high level of co-occurring resistance, especially between partner drugs such as INH and RIF, leading to a high proportion of the samples being multidrug-resistant TB (MDR-TB) (Fig 1, bottom)

Read more

Summary

Introduction

Tuberculosis disease (TB), caused by bacteria in the Mycobacterium tuberculosis (Mtb) complex, is a major global public health burden. The Mtb genome is 4.4 Mb in size, features a high (65%) GC-content and contains *4,000 genes [2]. Of the seven main lineages comprising the Mtb complex, four predominantly infect humans and have spread globally (Lineage 1: Indo-Oceanic, Lineage 2: East Asian, Lineage 3: East-Africa-Indian and Lineage 4: Euro-American) [4]. They can vary in virulence, transmissibility, and drug resistance as well as geographic distribution and spread [3, 5, 6]. Lineage 2, especially Beijing strains, have shown to be mobile with evidence of recent spread from Asia to Europe and Africa [7, 8]

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.