ARGprofiler-a pipeline for large-scale analysis of antimicrobial resistance genes and their flanking regions in metagenomic datasets.

Hannah-Marie Martiny,Oksana Lukjančenko,Frank M Aarestrup,Philip T L C Clausen,Nikiforos Pyrounakis,Thomas N Petersen,Patrick Munk,Can Alkan

doi:10.1093/bioinformatics/btae086

Hannah-Marie Martiny, Oksana Lukjančenko + Show 6 more

Open Access

https://doi.org/10.1093/bioinformatics/btae086

Copy DOI

Abstract

Analyzing metagenomic data can be highly valuable for understanding the function and distribution of antimicrobial resistance genes (ARGs). However, there is a need for standardized and reproducible workflows to ensure the comparability of studies, as the current options involve various tools and reference databases, each designed with a specific purpose in mind. In this work, we have created the workflow ARGprofiler to process large amounts of raw sequencing reads for studying the composition, distribution, and function of ARGs. ARGprofiler tackles the challenge of deciding which reference database to use by providing the PanRes database of 14078 unique ARGs that combines several existing collections into one. Our pipeline is designed to not only produce abundance tables of genes and microbes but also to reconstruct the flanking regions of ARGs with ARGextender. ARGextender is a bioinformatic approach combining KMA and SPAdes to recruit reads for a targeted de novo assembly. While our aim is on ARGs, the pipeline also creates Mash sketches for fast searching and comparisons of sequencing runs. The ARGprofiler pipeline is a Snakemake workflow that supports the reuse of metagenomic sequencing data and is easily installable and maintained at https://github.com/genomicepidemiology/ARGprofiler.

Full Text