Evaluation of Machine Learning Models for Predicting Antimicrobial Resistance of Actinobacillus pleuropneumoniae From Whole Genome Sequences.

Zhichang Liu,Zhenming Li,Ting Rong,Shuhong Li,Gang Wang,Jian Sun,Luchao Lv,Xianyong Ma,Guanghui Peng,Huijie Lu,Jiazhou Li,Dun Deng

doi:10.3389/fmicb.2020.00048

Abstract

Antimicrobial resistance (AMR) is becoming a huge problem in countries all over the world, and new approaches to identifying strains resistant or susceptible to certain antibiotics are essential in fighting against antibiotic-resistant pathogens. Genotype-based machine learning methods showed great promise as a diagnostic tool, due to the increasing availability of genomic datasets and AST phenotypes. In this article, Support Vector Machine (SVM) and Set Covering Machine (SCM) models were used to learn and predict the resistance of the five drugs (Tetracycline, Ampicillin, Sulfisoxazole, Trimethoprim, and Enrofloxacin). The SVM model used the number of co-occurring k-mers between the genome of the isolates and the reference genes to learn and predict the phenotypes of the bacteria to a specific antimicrobial, while the SCM model uses a greedy approach to construct conjunction or disjunction of Boolean functions to find the most concise set of k-mers that allows for accurate prediction of the phenotype. Five-fold cross-validation was performed on the training set of the SVM and SCM model to select the best hyperparameter values to avoid model overfitting. The training accuracy (mean cross-validation score) and the testing accuracy of SVM and SCM models of five drugs were above 90% regardless of the resistant mechanism of which were acquired resistant or point mutation in the chromosome. The results of correlation between the phenotype and the model predictions of the five drugs indicated that both SVM and SCM models could significantly classify the resistant isolates from the sensitive isolates of the bacteria (p < 0.01), and would be used as potential tools in antimicrobial resistance surveillance and clinical diagnosis in veterinary medicine.

Highlights

MATERIALS AND METHODSAntimicrobial resistance (AMR) in bacteria from humans and food-producing animals is becoming an urgent threat to the control of bacterial infections
We propose to apply the Support Vector Machine (SVM) and Set Covering Machine (SCM) algorithm to accurately predict their phenotype against five antimicrobial agents (Tetracycline, Ampicillin, Sulfisoxazole, Trimethoprim, and Enrofloxacin) from the whole genomes of 96 isolates of A. pleuropneumoniae
There were 8 isolates were resistant to four kinds of antimicrobials (Tetracycline, Ampicillin, Sulfisoxazole, and Trimethoprim); 17 isolates were resistant to 3 kinds of antimicrobials, 10 of them were resistant to Tetracycline, Ampicillin, and Sulfisoxazole, 7 of them were resistant to Tetracycline, Ampicillin, and Trimethoprim, respective; 22 isolates were resistant to 2 kinds of antimicrobials, 20 of them were resistant to Tetracycline and Sulfisoxazole, one of them was resistant to Tetracycline and Ampicillin, one of them was resistant to Sulfisoxazole and Trimethoprim, respectively; 18 isolates were resistant to single antimicrobial, 12 and 6 of them

Summary

MATERIALS AND METHODS

Antimicrobial resistance (AMR) in bacteria from humans and food-producing animals is becoming an urgent threat to the control of bacterial infections. We propose to apply the Support Vector Machine (SVM) and Set Covering Machine (SCM) algorithm to accurately predict their phenotype against five antimicrobial agents (Tetracycline, Ampicillin, Sulfisoxazole, Trimethoprim, and Enrofloxacin) from the whole genomes of 96 isolates of A. pleuropneumoniae. The WGS reads and binary resistance phenotypes of 5 antimicrobial agents (tetracycline, ampicillin, sulfisoxazole, trimethoprim, and enrofloxacin) of 96 isolated strains of A. pleuropneumoniae data were obtained from Bossé et al (2017). Support Vector Machine (SVM; radial basis function kernel) used the number of co-occurring k-mers of the strain and the reference genes of the specific antimicrobial to learn and predict the phenotypes of each isolate. Where TP was the number of resistant strains predicted to be resistant, TN was the number of sensitive strains predicted to be sensitive, FP was the number of sensitive strains predicted to be resistant, and FN was the number of resistant strains predicted to be sensitive

RESULTS

DISCUSSION

DATA AVAILABILITY STATEMENT