Abstract

Metagenomic studies unravel details about the taxonomic composition and the functions performed by microbial communities. As a complete metagenomic analysis requires different tools for different purposes, the selection and setup of these tools remain challenging. Furthermore, the chosen toolset will affect the accuracy, the formatting, and the functional identifiers reported in the results, impacting the results interpretation and the biological answer obtained. Thus, we surveyed state-of-the-art tools available in the literature, created simulated datasets, and performed benchmarks to design a sensitive and flexible metagenomic analysis pipeline. Here we present MEDUSA, an efficient pipeline to conduct comprehensive metagenomic analyses. It performs preprocessing, assembly, alignment, taxonomic classification, and functional annotation on shotgun data, supporting user-built dictionaries to transfer annotations to any functional identifier. MEDUSA includes several tools, as fastp, Bowtie2, DIAMOND, Kaiju, MEGAHIT, and a novel tool implemented in Python to transfer annotations to BLAST/DIAMOND alignment results. These tools are installed via Conda, and the workflow is managed by Snakemake, easing the setup and execution. Compared with MEGAN 6 Community Edition, MEDUSA correctly identifies more species, especially the less abundant, and is more suited for functional analysis using Gene Ontology identifiers.

Highlights

  • The recent reduction of sequencing costs, a consequence of second-generation sequencing technology advances, notably benefited the metagenomics field

  • For samples collected from exotic environments, when no close relatives are expected to be found in the reference database, the assembly approach is desirable

  • SOAPnuke was removed from the results for presenting outputs with a different number of reads when the only parameter change was the number of cores

Read more

Summary

Introduction

The recent reduction of sequencing costs, a consequence of second-generation sequencing technology advances, notably benefited the metagenomics field. Metagenome shotgun sequencing became widely used, allowing microbial DNA sequencing from an environmental sample without selecting any particular gene. Shotgun data contains information about the microbial community functional activity, adding ecological information to metagenomic studies. There are two metagenomic analysis approaches: read classification and metagenomic assembly (Breitwieser et al, 2019). These approaches share common analysis steps, such as data preprocessing, the alignment against a reference database, taxonomic classification, and functional annotation. Read classification is useful for organisms with close relatives represented in the reference database. For samples collected from exotic environments, when no close relatives are expected to be found in the reference database, the assembly approach is desirable. The toolset choice impacts the analysis results and conclusions (Lindgreen et al, 2016), and efficiently selecting a toolset to conduct a complete metagenomics analysis remains challenging

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call