Abstract

YAMP ("Yet Another Metagenomics Pipeline") is a user-friendly workflow that enables the analysis of whole shotgun metagenomic data while using containerization to ensure computational reproducibility and facilitate collaborative research. YAMP can be executed on any UNIX-like system and offers seamless support for multiple job schedulers as well as for the Amazon AWS cloud. Although YAMP was developed to be ready to use by nonexperts, bioinformaticians will appreciate its flexibility, modularization, and simple customization.

Highlights

  • Thanks to the increased cost-effectiveness of high-throughput technologies, the number of studies collecting and analyzing large amounts of data has surged, opening new challenges for data analysis and research reproducibility

  • To facilitate the discussion on YAMP computational requirements and to assess its ability to reproduce research results described in the literature, we carried out a real-world case study, which included 18 samples collected from different body sites

  • Despite both the simulation and the real-world case study focus on human metagenomic data, YAMP can be used for the analysis of data that originate from virtually any environment

Read more

Summary

Introduction

Thanks to the increased cost-effectiveness of high-throughput technologies, the number of studies collecting and analyzing large amounts of data has surged, opening new challenges for data analysis and research reproducibility. Variations across workstations and operating systems represent another obstacle [5, 6] To overcome this issue, tools that allow the development of workflows [7] and software containers [8] have been proposed [9]. Containerized workflows facilitate collaborative projects by ensuring identical analysis processes, comparable results, and allow the automatization of data-intensive repetitive tasks [11]. They save users with little bioinformatics or computational expertise from the hassles of installing the required pieces of software and of designing and implementing often complex analysis orchestrations, while expert bioinformaticians can use them as a starting point for customized analyses, avoiding redundant solutions

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call