Abstract

BackgroundSimulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Conclusions from benchmark studies are therefore limited to the ecosystems they mimic. Ideally, simulations are therefore based on genomes, which resemble particular metagenomic communities realistically.ResultsWe developed Tamock to facilitate the realistic simulation of metagenomic reads according to a metagenomic community, based on real sequence data. Benchmarks samples can be created from all genomes and taxonomic domains present in NCBI RefSeq. Tamock automatically determines taxonomic profiles from shotgun sequence data, selects reference genomes accordingly and uses them to simulate metagenomic reads. We present an example use case for Tamock by assessing assembly and binning method performance for selected microbiomes.ConclusionsTamock facilitates automated simulation of habitat-specific benchmark metagenomic data based on real sequence data and is implemented as a user-friendly command-line application, providing extensive additional information along with the simulated benchmark data. Resulting benchmarks enable an assessment of computational methods, workflows, and parameters specifically for a metagenomic habitat or ecosystem of a metagenomic study.AvailabilitySource code, documentation and install instructions are freely available at GitHub (https://github.com/gerners/tamock).

Highlights

  • Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation

  • We developed Tamock to automatically create benchmark data directly based on the taxonomic composition of a metagenomic sample to provide a sample specific benchmark for a particular habitat

  • We present the use of benchmark samples created by Tamock for the evaluation of assembly and binning methods as an example use case for selected urban metagenomes and samples from the Integrative Human Microbiome Project [12] (Additional file 2: Table S1)

Read more

Summary

Introduction

Simulated metagenomic reads are widely used to benchmark software and workflows for metagenome interpretation. The results of metagenomic benchmarks depend on the assumptions about their underlying ecosystems. Benchmark studies use simulated data or sequence mock communities with known composition. Conclusions of such benchmarks are limited to their underlying data. Gerner et al BMC Bioinformatics (2021) 22:227 for a specific study, benchmark data properties should resemble the actual data of the study as closely as possible. Creating such benchmark data is challenging, since metagenomic communities can vary substantially in complexity and composition including fractions of sequences with unknown origin. Artificial design and resulting biases further limit the scope and power of benchmarks

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.