CAMISIM: simulating metagenomes and microbial communities

Adrian Fritz,Till R Lesker,Andreas Bremges,Johannes Dröge,Alexander Sczyrba,Eik Dahms,Aaron E Darling,Jessika Fiedler,Matthew Z Demaere,Stephan Majda,Peter Hofmann,Peter Belmann,Alice C Mchardy

doi:10.1186/s40168-019-0633-6

Abstract

BackgroundShotgun metagenome data sets of microbial communities are highly diverse, not only due to the natural variation of the underlying biological systems, but also due to differences in laboratory protocols, replicate numbers, and sequencing technologies. Accordingly, to effectively assess the performance of metagenomic analysis software, a wide range of benchmark data sets are required.ResultsWe describe the CAMISIM microbial community and metagenome simulator. The software can model different microbial abundance profiles, multi-sample time series, and differential abundance studies, includes real and simulated strain-level diversity, and generates second- and third-generation sequencing data from taxonomic profiles or de novo. Gold standards are created for sequence assembly, genome binning, taxonomic binning, and taxonomic profiling. CAMSIM generated the benchmark data sets of the first CAMI challenge. For two simulated multi-sample data sets of the human and mouse gut microbiomes, we observed high functional congruence to the real data. As further applications, we investigated the effect of varying evolutionary genome divergence, sequencing depth, and read error profiles on two popular metagenome assemblers, MEGAHIT, and metaSPAdes, on several thousand small data sets generated with CAMISIM.ConclusionsCAMISIM can simulate a wide variety of microbial communities and metagenome data sets together with standards of truth for method evaluation. All data sets and the software are freely available at https://github.com/CAMI-challenge/CAMISIM

Highlights

Extensive 16S rRNA gene amplicon and shotgun metagenome sequencing efforts have been and are being undertaken to catalogue the human microbiome in health and disease [1, 2] and to study microbial communities of medical, pharmaceutical, or biotechnological relevance [3,4,5,6,7,8]
We have since learned that naturally occurring microbial communities cover a wide range of organismal complexities—with populations ranging from half
Owing to the large diversity of generated data, the possibility to generate realistic benchmark data sets for particular experimental setups is essential for assessing computational metagenomics software

Summary

Results

Comparison to the state-of-the-art We tested seven simulators and compared them to CAMISIM (Table 1). To assess the effect of sequencing errors, four read data sets were simulated: three using wgsim with uniform error rates of 0%, 2%, and 5%, and one using ART with the CAMI challenge error profile (ART CAMI) Both assemblers were run on these data sets with default options, except for the phred-offset parameter for metaSPAdes, which was set to 33. For each of the 152 · 20 = 3040 pairs of original and evolved genome sequences, we simulated single sample minimal metagenomes at equal genome abundances, with error-free reads at 50× coverage using wgsim. This constitutes good coverage for the analyzed assemblers, as shown in the previous section. Since for mouse gut only a few complete reference genomes were available, the “scaffold” quality for downloading genomes was chosen

Conclusions

Introduction

Discussion and conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Microbiome	Publication Date: Feb 8, 2019
Citations: 135	License type: open-access

R Discovery Prime

R Discovery Prime

CAMISIM: simulating metagenomes and microbial communities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome

Lead the way for us

Similar Papers

Inference of phenotype-defining functional modules of protein families for microbial plant biomass degraders
Sebastian Konietzny ... Phillip B Pope
Biotechnology for Biofuels | VOL. 7
Sebastian Konietzny, et. al.Sebastian Konietzny ... Phillip B Pope
01 Jan 2014
Biotechnology for Biofuels | VOL. 7

Inference of phenotype-defining functional modules of protein families for microbial plant biomass degraders.
Sebastian Ga Konietzny ... Aaron Weimann
Biotechnology for Biofuels | VOL. 7
Sebastian Ga Konietzny, et. al.Sebastian Ga Konietzny ... Aaron Weimann
09 Sep 2014
Biotechnology for Biofuels | VOL. 7

TAMPA: interpretable analysis and visualization of metagenomics-based taxon abundance profiles.
Varuni Sarwal ... David Koslicki
GigaScience | VOL. 12
Varuni Sarwal, et. al.Varuni Sarwal ... David Koslicki
28 Dec 2022
GigaScience | VOL. 12

Investigation of machine learning algorithms for taxonomic classification of marine metagenomes.
Helen Park ... Paul A Jensen
Microbiology spectrum | VOL. 11
Helen Park, et. al.Helen Park ... Paul A Jensen
11 Sep 2023
Microbiology spectrum | VOL. 11

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CAMISIM: simulating metagenomes and microbial communities

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Microbiome