Abstract

With the ever-increasing availability of whole-genome sequences, machine-learning approaches can be used as an alternative to traditional alignment-based methods for identifying new antimicrobial-resistance genes. Such approaches are especially helpful when pathogens cannot be cultured in the lab. In previous work, we proposed a game-theory-based feature evaluation algorithm. When using the protein characteristics identified by this algorithm, called ‘features’ in machine learning, our model accurately identified antimicrobial resistance (AMR) genes in Gram-negative bacteria. Here we extend our study to Gram-positive bacteria showing that coupling game-theory-identified features with machine learning achieved classification accuracies between 87% and 90% for genes encoding resistance to the antibiotics bacitracin and vancomycin. Importantly, we present a standalone software tool that implements the game-theory algorithm and machine-learning model used in these studies.

Highlights

  • With the ever-increasing availability of whole-genome sequences, machine-learning approaches can be used as an alternative to traditional alignment-based methods for identifying new antimicrobialresistance genes

  • We recently introduced a game-theory-based feature selection approach (“game theoretic dynamic weighting based feature evaluation”, or GTDWFE) predicated on the supposition that a single feature might provide limited predictive value, but that it might contribute to form a strong coalition when used with other f­eatures[21]

  • The combination of GTDWFE and support vector machine (SVM) resulted in correct classification rates of 93%, 99%, and 97% for aac, bla, and dfr, respectively

Read more

Summary

Introduction

With the ever-increasing availability of whole-genome sequences, machine-learning approaches can be used as an alternative to traditional alignment-based methods for identifying new antimicrobialresistance genes. One conventional strategy for identifying genetically-encoded mechanisms for AMR involves sequence ­assembly[14,15,16,17] and read-based t­echniques[18,19,20] that map sequence data directly to reference databases These methods perform well for known and highly conserved AMR genes, they may produce an unacceptable number of false positives (genes predicted to encode resistance when they do not) for highly dissimilar sequences as was demonstrated previously for Gram-negative b­ acteria[21]. Several machine-learning methods have been proposed to identify novel AMR genes from metagenomic and pan-genome ­data[12, 22, 23], but these methods used a small number of genetic features for predictions These approaches did not use a feature-selection strategy to remove irrelevant and redundant features that might compromise the accuracy of a machine-learning model. Protein names Putative undecaprenol kinase Undecaprenol kinase bacA Putative undecaprenol kinase bacitracin resistance protein Putative undecaprenol kinase Undecaprenyl-diphosphatase UppP Hypothetical protein MAQA_05683 Serine O-acetyltransferase

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call