Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data

Nithum Thain,Christopher Le,Aldo Crossa,Shama Desai Ahuja,Jeanne Sullivan Meissner,Barun Mathema,Barry Kreiswirth,Natalia Kurepina,Ted Cohen,Leonid Chindelevitch

doi:10.1016/j.meegid.2018.06.029

Abstract

The determination of lineages from strain-based molecular genotyping information is an important problem in tuberculosis. Mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) typing is a commonly used molecular genotyping approach that uses counts of the number of times pre-specified loci repeat in a strain. There are three main approaches for determining lineage based on MIRU-VNTR data - one based on a direct comparison to the strains in a curated database, and two others, on machine learning algorithms trained on a large collection of labeled data.All existing methods have limitations. The direct approach imposes an arbitrary threshold on how much a database strain can differ from a given one to be informative. On the other hand, the machine learning-based approaches require a substantial amount of labeled data. Notably, all three methods exhibit suboptimal classification accuracy without additional data.We explore several computational approaches to address these limitations. First, we show that eliminating the arbitrary threshold improves the performance of the direct approach. Second, we introduce RuleTB, an alternative direct method that proposes a concise set of rules for determining lineages. Lastly, we propose StackTB, a machine learning approach that requires only a fraction of the training data to outperform the accuracy of both existing machine learning methods.Our approaches demonstrate superior performance on a training dataset collected in New York City over 10 years, and the improvement in performance translates to a held-out testing set. We conclude that our methods provide opportunities for improving the determination of pathogenic lineages based on MIRU-VNTR data.

Highlights

The genetic diversity of the infectious pathogen Mycobacterium tuberculosis has played an important role in its adaptation to its diverse host species, including humans [1, 2, 3]
We conclude that our methods provide opportunities for improving the determination of pathogenic lineages based on MIRU-VNTR data
As we describe in the Supplementary Materials, Malioutov and Varshney [23] propose an approach based on linear programming (LP) to the NP-hard problem [24] of identifying the smallest set of complex rules of this form

Summary

Introduction

The genetic diversity of the infectious pathogen Mycobacterium tuberculosis has played an important role in its adaptation to its diverse host species, including humans [1, 2, 3]. Several different molecular genotyping methods have been used to assign lineages to M. tuberculosis, including restriction fragment length polymorphism (RFLP), spacer oligonucleotide typing (spoligotyping), large sequence polymorphisms (LSPs), single nucleotide polymorphisms (SNPs), and mycobacterial interspersed repetitive unit-variable number tandem repeats (MIRU-VNTR). The method consists in assigning to a strain of interest the lineage of the strain in the database that differs from it in the smallest number of loci, provided that this number does not exceed 4 out of 24 loci Another widely used method, called TB-Insight [21], uses a machine learning method called Conformal Bayesian Networks for the classification problem. We separate it into a training set, which we use to develop our methods, and a testing set, which we use to evaluate their performance

Dataset preparation

Removing the arbitrary threshold

Producing interpretable rules

Designing a machine learning method

Implementation

Model Training

Performance Analysis

Sensitivity Analysis

Performance of our methods on broad lineages

Performance of our methods on a refined classification

Comparison to existing methods

Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Infection, Genetics and Evolution	Publication Date: Jun 28, 2018
Citations: 9	License type: cc-by

R Discovery Prime

R Discovery Prime

Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Infection, Genetics and Evolution

Lead the way for us

Similar Papers

Mycobacterial Interspersed Repetitive Unit-variable Number Tandem Repeat (MIRU-VNTR) Typing Lacks Discriminatory Power in the Genetic Analysis of Bovine Tuberculosis in Egypt
Hebatallah Ahmed Mahgoub ... Walaa Awadin
American Journal of Microbiological Research | VOL. 5
Hebatallah Ahmed Mahgoub, et. al.Hebatallah Ahmed Mahgoub ... Walaa Awadin
22 Dec 2017
American Journal of Microbiological Research | VOL. 5

Genetic Diversity of Drug-resistant Mycobacterium tuberculosis Isolates in Isfahan Province of Iran.
Bahram Nasr Esfahani ... Sharareh Moghim
Advanced Biomedical Research | VOL. 7
Bahram Nasr Esfahani, et. al.Bahram Nasr Esfahani ... Sharareh Moghim
01 Jan 2018
Advanced Biomedical Research | VOL. 7

First insight into the genetic population structure of Mycobacterium tuberculosis isolated from pulmonary tuberculosis patients in Egypt
Hassan Mahmoud Diab ... Yasuhiko Suzuki
Tuberculosis | VOL. 96
Hassan Mahmoud Diab, et. al.Hassan Mahmoud Diab ... Yasuhiko Suzuki
14 Nov 2015
Tuberculosis | VOL. 96

Genotyping of ClinicalMycobacterium tuberculosisIsolates Based on IS6110and MIRU-VNTR Polymorphisms
Anna Żaczek ... Anna Brzostek
BioMed research international | VOL. 2013
Anna Żaczek, et. al.Anna Żaczek ... Anna Brzostek
01 Jan 2013
Genotyping of ClinicalMycobacterium tuberculosisIsolates Based on IS6110and MIRU-VNTR Polymorphisms
Anna Żaczek ... Anna Brzostek

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Towards better prediction of Mycobacterium tuberculosis lineages from MIRU-VNTR data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Infection, Genetics and Evolution