Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli.

Timothy J Davies,Jeremy Swann,Nicole Stoesser,Hayleigh Pickford,Derrick W Crook,Philip W Fowler,Katie L Hopkins,Muna F Anjum,Matthew J Ellington,A Sarah Walker,Samuel Lipworth,Manal Abuoun,Anna E Sheppard,Susan Hopkins,Timothy E A Peto

doi:10.1099/mgen.0.001151

Abstract

Several bioinformatics genotyping algorithms are now commonly used to characterize antimicrobial resistance (AMR) gene profiles in whole-genome sequencing (WGS) data, with a view to understanding AMR epidemiology and developing resistance prediction workflows using WGS in clinical settings. Accurately evaluating AMR in Enterobacterales, particularly Escherichia coli, is of major importance, because this is a common pathogen. However, robust comparisons of different genotyping approaches on relevant simulated and large real-life WGS datasets are lacking. Here, we used both simulated datasets and a large set of real E. coli WGS data (n=1818 isolates) to systematically investigate genotyping methods in greater detail. Simulated constructs and real sequences were processed using four different bioinformatic programs (ABRicate, ARIBA, KmerResistance and SRST2, run with the ResFinder database) and their outputs compared. For simulation tests where 3079 AMR gene variants were inserted into random sequence constructs, KmerResistance was correct for 3076 (99.9 %) simulations, ABRicate for 3054 (99.2 %), ARIBA for 2783 (90.4 %) and SRST2 for 2108 (68.5 %). For simulation tests where two closely related gene variants were inserted into random sequence constructs, KmerResistance identified the correct alleles in 35 338/46 318 (76.3 %) simulations, ABRicate identified them in 11 842/46 318 (25.6 %) simulations, ARIBA identified them in 1679/46 318 (3.6 %) simulations and SRST2 identified them in 2000/46 318 (4.3 %) simulations. In real data, across all methods, 1392/1818 (76 %) isolates had discrepant allele calls for at least 1 gene. In addition to highlighting areas for improvement in challenging scenarios, (e.g. identification of AMR genes at <10× coverage, identifying multiple closely related AMR genes present in the same sample), our evaluations identified some more systematic errors that could be readily soluble, such as repeated misclassification (i.e. naming) of genes as shorter variants of the same gene present within the reference resistance gene database. Such naming errors accounted for at least 2530/4321 (59 %) of the discrepancies seen in real data. Moreover, many of the remaining discrepancies were likely 'artefactual', with reporting of cut-off differences accounting for at least 1430/4321 (33 %) discrepants. Whilst we found that comparing outputs generated by running multiple algorithms on the same dataset could identify and resolve these algorithmic artefacts, the results of our evaluations emphasize the need for developing new and more robust genotyping algorithms to further improve accuracy and performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microbial genomics	Publication Date: Dec 15, 2023
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli.

Abstract

Talk to us

Similar Papers

More From: Microbial genomics

Lead the way for us

Similar Papers

Identification of antimicrobial resistance genes and drug resistance analysis of Escherichia coli in the animal farm environment
Jin-Ju Peng ... Wen-Chao Liu
Journal of Infection and Public Health | VOL. 14
Jin-Ju Peng, et. al.Jin-Ju Peng ... Wen-Chao Liu
01 Nov 2021
Journal of Infection and Public Health | VOL. 14

Identification of faecal Escherichia coli isolates with similar patterns of virulence and antimicrobial resistance genes in dogs and their owners.
Zahra Naziri ... Sahar Zare
Veterinary medicine and science | VOL. 9
Zahra Naziri, et. al.Zahra Naziri ... Sahar Zare
12 Oct 2022
Veterinary medicine and science | VOL. 9

Plasmid Composition, Antimicrobial Resistance and Virulence Genes Profiles of Ciprofloxacin- and Third-Generation Cephalosporin-Resistant Foodborne Salmonella enterica Isolates from Russia
Anna Egorova ... Vasiliy Akimkin
Microorganisms | VOL. 11
Anna Egorova, et. al.Anna Egorova ... Vasiliy Akimkin
30 Jan 2023
Microorganisms | VOL. 11

Distribution of Antimicrobial Resistance Genes Across Salmonella enterica Isolates from Animal and Nonanimal Foods
J.B Pettengill ... M.C Bazaco
Journal of food protection | VOL. 83
J.B Pettengill, et. al.J.B Pettengill ... M.C Bazaco
01 Feb 2020
Journal of food protection | VOL. 83

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Discordance between different bioinformatic methods for identifying resistance genes from short-read genomic data, with a focus on Escherichia coli.

Abstract

Talk to us

Similar Papers

More From: Microbial genomics