Abstract

Machine learning can be used as an alternative to similarity algorithms such as BLASTp when the latter fail to identify dissimilar antimicrobial-resistance genes (ARGs) in bacteria; however, determining the most informative characteristics, known as features, for antimicrobial resistance (AMR) is essential to obtain accurate predictions. In this paper, we introduce a feature selection algorithm called symmetrical uncertainty qualitative mutual information (SU-QMI), which selects features based on estimates of their relevance, redundancy, and interdependency. We use these together with graph theory to derive a feature selection method for identifying putative ARGs in Gram-negative bacteria. We extract physicochemical, evolutionary, and structural features from the protein sequences of five genera of Gram-negative bacteria—Acinetobacter, Klebsiella, Campylobacter, Salmonella, and Escherichia—which confer resistance to acetyltransferase (aac), β-lactamase (bla), and dihydrofolate reductase (dfr). Our SU-QMI algorithm is then used to find the best subset of features, and a support vector machine (SVM) model is trained for AMR prediction using this feature subset. We evaluate performance using an independent set of protein sequences from three Gram-negative bacterial genera—Pseudomonas, Vibrio, and Enterobacter—and achieve prediction accuracy ranging from 88 to 100%. Compared to the SU-QMI method, BLASTp requires similarity as low as 53% for comparable classification results. Our results indicate the effectiveness of the SU-QMI method for selecting the best protein features for AMR prediction in Gram-negative bacteria.

Highlights

  • Thousands of people in the United States die each year due to infections by antimicrobialresistant bacteria [1,2]

  • Machine learning algorithms are not restricted to sequence similarity, and a machine learning method is a promising alternative for identifying unrecognized antimicrobial-resistance genes (ARGs) in bacteria

  • We introduce a graphtheoretic feature selection algorithm called symmetrical uncertainty qualitative mutual information (SU-QMI) in which a feature is selected based on estimates of its relevance, nonredundancy, and interdependency

Read more

Summary

Introduction

Thousands of people in the United States die each year due to infections by antimicrobialresistant bacteria [1,2]. When new antimicrobial-resistance genes (ARGs) emerge in a population, it may be difficult or impossible to recognize these genes based on conventional sequence similarity algorithms. Sequence matching algorithms such as BLASTp can be applied to find ARGs in bacterial genomes; such algorithms do not work well for dissimilar sequences unless very relaxed. SU-QMI is based on the concepts of symmetrical uncertainty [4], qualitative mutual information [5], and graph theory for predicting AMR in Gram-negative bacteria. The performance of our machine learning model is compared with BLASTp results

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call