Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN

Xingyan Kuang,Kyle M Hernandez,Robert L Grossman,Fan Wang,Zhenyu Zhang

doi:10.1038/s41598-022-06449-4

Xingyan Kuang, Kyle M Hernandez + Show 3 more

Open Access

https://doi.org/10.1038/s41598-022-06449-4

Copy DOI

Journal: Scientific Reports	Publication Date: Feb 14, 2022
Citations: 34	License type: open-access

Affiliation: University of Chicago

Abstract

Effective and timely antibiotic treatment depends on accurate and rapid in silico antimicrobial-resistant (AMR) predictions. Existing statistical rule-based Mycobacterium tuberculosis (MTB) drug resistance prediction methods using bacterial genomic sequencing data often achieve varying results: high accuracy on some antibiotics but relatively low accuracy on others. Traditional machine learning (ML) approaches have been applied to classify drug resistance for MTB and have shown more stable performance. However, there is no study that uses deep learning architecture like Convolutional Neural Network (CNN) on a large and diverse cohort of MTB samples for AMR prediction. We developed 24 binary classifiers of MTB drug resistance status across eight anti-MTB drugs and three different ML algorithms: logistic regression, random forest and 1D CNN using a training dataset of 10,575 MTB isolates collected from 16 countries across six continents, where an extended pan-genome reference was used for detecting genetic features. Our 1D CNN architecture was designed to integrate both sequential and non-sequential features. In terms of F1-scores, 1D CNN models are our best classifiers that are also more accurate and stable than the state-of-the-art rule-based tool Mykrobe predictor (81.1 to 93.8%, 93.7 to 96.2%, 93.1 to 94.8%, 95.9 to 97.2% and 97.1 to 98.2% for ethambutol, rifampicin, pyrazinamide, isoniazid and ofloxacin respectively). We applied filter-based feature selection to find AMR relevant features. All selected variant features are AMR-related ones in CARD database. 78.8% of them are also in the catalogue of MTB mutations that were recently identified as drug resistance-associated ones by WHO. To facilitate ML model development for AMR prediction, we packaged every step into an automated pipeline and shared the source code at https://github.com/KuangXY3/MTB-AMR-classification-CNN.

Highlights

PATRIC Pathosystems Resource Integration Center support vector machine (SVM) Support vector machine whole genome sequencing (WGS) Whole-genome sequencing sequence read archive (SRA) Sequence read archive drug susceptibility test (DST) Drug susceptibility test TP True positive True Negatives (TN) True negative FP False positive FN False negative
To compare the performance of our machine learning (ML) classifiers with a state-of-the-art statistical modeling method Mykrobe predictor, we evaluated the accuracy of Mykrobe predictor on the same dataset[14]
The results showed that our best ML classifiers outperformed the state-of-the-art rule-based method Mykrobe predictor, especially for EMB resistance, and showed more stable accuracy to all the four first-line drugs

Summary

Introduction

PATRIC Pathosystems Resource Integration Center SVM Support vector machine WGS Whole-genome sequencing SRA Sequence read archive DST Drug susceptibility test TP True positive TN True negative FP False positive FN False negative. There is an urgent need to rapidly identify drug sensitivity profiles of TB, given the fact that culture-based diagnostic tests are usually time-consuming To overcome these restrictions and identify antibiotic resistance more efficiently, researchers use conventional association rule methods to predict antimicrobial r esistance[6]. These methods are based on the identification of variants associated with AMR from whole genome sequencing (WGS) data. The results showed that our best ML classifiers outperformed the state-of-the-art rule-based method Mykrobe predictor, especially for EMB resistance, and showed more stable accuracy to all the four first-line drugs. Our basic 1D CNN architecture didn’t significantly outperform our traditional ML methods LR and RF, there are potential ways to optimize it in the future, e.g., hyperparameter tuning

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

An exhaustive measurement of re-sampling detection in lossy compressed images using deep learning approach
Vijayakumar Kadha ... Santos Kumar Das
Engineering Applications of Artificial Intelligence | VOL. 129
Vijayakumar Kadha, et. al.Vijayakumar Kadha ... Santos Kumar Das
30 Nov 2023
Engineering Applications of Artificial Intelligence | VOL. 129

Immune Disorders in Patients with Pulmonary Tuberculosis with Primary and Acquired Drug-Resistance of <i>Mycobacterium Tuberculosis</i>
E Lesnik ... S Ginda
Tuberculosis and Lung Diseases | VOL. 100
E Lesnik, et. al.E Lesnik ... S Ginda
09 Nov 2022
Tuberculosis and Lung Diseases | VOL. 100

Identification of Mycobacterium tuberculosis Resistance to Common Antibiotics: An Overview of Current Methods and Techniques.
Xue-Song Xiong ... Fen Li
Infection and Drug Resistance | VOL. 17
Xue-Song Xiong, et. al.Xue-Song Xiong ... Fen Li
01 Apr 2024
Infection and Drug Resistance | VOL. 17

Stacking ensemble learning models for daily runoff prediction using 1D and 2D CNNs
Yutong Xie ... Xingyou Pan
Expert Systems with Applications | VOL. 217
Yutong Xie, et. al.Yutong Xie ... Xingyou Pan
24 Dec 2022
Expert Systems with Applications | VOL. 217

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accurate and rapid prediction of tuberculosis drug resistance from genome sequence data using traditional machine learning algorithms and CNN

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports