Abstract
YAMP ("Yet Another Metagenomics Pipeline") is a user-friendly workflow that enables the analysis of whole shotgun metagenomic data while using containerization to ensure computational reproducibility and facilitate collaborative research. YAMP can be executed on any UNIX-like system and offers seamless support for multiple job schedulers as well as for the Amazon AWS cloud. Although YAMP was developed to be ready to use by nonexperts, bioinformaticians will appreciate its flexibility, modularization, and simple customization.
Highlights
Thanks to the increased cost-effectiveness of high-throughput technologies, the number of studies collecting and analyzing large amounts of data has surged, opening new challenges for data analysis and research reproducibility
To facilitate the discussion on YAMP computational requirements and to assess its ability to reproduce research results described in the literature, we carried out a real-world case study, which included 18 samples collected from different body sites
Despite both the simulation and the real-world case study focus on human metagenomic data, YAMP can be used for the analysis of data that originate from virtually any environment
Summary
Thanks to the increased cost-effectiveness of high-throughput technologies, the number of studies collecting and analyzing large amounts of data has surged, opening new challenges for data analysis and research reproducibility. Variations across workstations and operating systems represent another obstacle [5, 6] To overcome this issue, tools that allow the development of workflows [7] and software containers [8] have been proposed [9]. Containerized workflows facilitate collaborative projects by ensuring identical analysis processes, comparable results, and allow the automatization of data-intensive repetitive tasks [11]. They save users with little bioinformatics or computational expertise from the hassles of installing the required pieces of software and of designing and implementing often complex analysis orchestrations, while expert bioinformaticians can use them as a starting point for customized analyses, avoiding redundant solutions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have