Abstract
We present metaSNV, a tool for single nucleotide variant (SNV) analysis in metagenomic samples, capable of comparing populations of thousands of bacterial and archaeal species. The tool uses as input nucleotide sequence alignments to reference genomes in standard SAM/BAM format, performs SNV calling for individual samples and across the whole data set, and generates various statistics for individual species including allele frequencies and nucleotide diversity per sample as well as distances and fixation indices across samples. Using published data from 676 metagenomic samples of different sites in the oral cavity, we show that the results of metaSNV are comparable to those of MIDAS, an alternative implementation for metagenomic SNV analysis, while data processing is faster and has a smaller storage footprint. Moreover, we implement a set of distance measures that allow the comparison of genomic variation across metagenomic samples and delineate sample-specific variants to enable the tracking of specific strain populations over time. The implementation of metaSNV is available at: http://metasnv.embl.de/.
Highlights
Strain-level analysis of metagenomes has been shown to be feasible even for complex communities such as the human gut [1] and a number of tools have been developed to enable researchers to study microbial communities at this level of resolution
We show that our approach identifies extensive variation within microbial species and that this variation is informative in quantifying differences between metagenomic samples
As a demonstration, using data from the Human Microbiome Project (HMP) [10], we show that the genomic variation of most bacteria that inhabit the human oral cavity is highly correlated with the specific sub-habitat that they have been collected from and that individual single nucleotide variant (SNV) profiles are stable over time
Summary
Strain-level analysis of metagenomes has been shown to be feasible even for complex communities such as the human gut [1] and a number of tools have been developed to enable researchers to study microbial communities at this level of resolution. We do not perform a comparison to the output of tools that use only a subset of the genome to determine strain haplotypes, be it a set of common marker genes [5] or a species-specific set [6].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.