Escherichia coli (E. coli) has become a particular concern due to the increasing incidence of antimicrobial resistance (AMR) observed worldwide. Using machine learning (ML) to predict E. coli AMR is a more efficient method than traditional laboratory testing. However, further improvement in the predictive performance of existing models remains challenging. In this study, we collected 1,937 high-quality whole genome sequencing (WGS) data from public databases with an antimicrobial resistance phenotype and modified the existing workflow by adding an attention mechanism to enable the modified workflow to focus more on core single nucleotide polymorphisms (SNPs) that may significantly lead to the development of AMR in E. coli. While comparing the model performance before and after adding the attention mechanism, we also performed a cross-comparison among the published models using random forest (RF), support vector machine (SVM), logistic regression (LR), and convolutional neural network (CNN). Our study demonstrates that the discriminative positional colors of Chaos Game Representation (CGR) images can selectively influence and highlight genome regions without prior knowledge, enhancing prediction accuracy. Furthermore, we developed an online tool (https://github.com/tjiaa/E.coli-ML/tree/main) for assisting clinicians in the rapid prediction of the AMR phenotype of E. coli and accelerating clinical decision-making.
Read full abstract