Comparison of metagenomic samples using sequence signatures.

Bai Jiang,Kai Song,Xuegong Zhang,Minghua Deng,Jie Ren,Fengzhu Sun

doi:10.1186/1471-2164-13-730

Bai Jiang, Kai Song + Show 4 more

Open Access

https://doi.org/10.1186/1471-2164-13-730

Copy DOI

Abstract

BackgroundSequence signatures, as defined by the frequencies of k-tuples (or k-mers, k-grams), have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Recently many next-generation sequencing (NGS) read data sets of metagenomic samples from a variety of different environments have been generated. The assembly of these reads can be difficult and analysis methods based on mapping reads to genes or pathways are also restricted by the availability and completeness of existing databases. Sequence-signature-based methods, however, do not need the complete genomes or existing databases and thus, can potentially be very useful for the comparison of metagenomic samples using NGS read data. Still, the applications of sequence signature methods for the comparison of metagenomic samples have not been well studied.ResultsWe studied several dissimilarity measures, including d2, d2* and d2S recently developed from our group, a measure (hereinafter noted as Hao) used in CVTree developed from Hao’s group (Qi et al., 2004), measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al. (2009), as well as standard lp measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures. We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS) metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Results showed that the dissimilarity measure d2S can achieve superior performance when comparing metagenomic samples by clustering them into different groups as well as recovering environmental gradients affecting microbial samples. New insights into the environmental factors affecting microbial compositions in metagenomic samples are obtained through the analyses. Our results show that sequence signatures of the mammalian gut are closely associated with diet and gut physiology of the mammals, and that sequence signatures of marine communities are closely related to location and temperature.ConclusionsSequence signatures can successfully reveal major group and gradient relationships among metagenomic samples from NGS reads without alignment to reference databases. The d2S dissimilarity measure is a good choice in all application scenarios. The optimal choice of tuple size depends on sequencing depth, but it is quite robust within a range of choices for moderate sequencing depths.

Highlights

Sequence signatures, as defined by the frequencies of k-tuples, have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences
We conducted a series of computational experiments by both extensive simulations and real data analyses to study the effectiveness of the sequence signature methods in identifying group and gradient relationships of microbial community samples
We studied the performance of a dissimilarity measure used in CVTree (Hao) [41] and measures based on di, tri, and tetra-nucleotide signatures (Willner) [38]

Summary

Introduction

As defined by the frequencies of k-tuples (or k-mers, k-grams), have been used extensively to compare genomic sequences of individual organisms, to identify cis-regulatory modules, and to study the evolution of regulatory sequences. Results: We studied several dissimilarity measures, including d2, d2* and d2S recently developed from our group, a measure (hereinafter noted as Hao) used in CVTree developed from Hao’s group (Qi et al, 2004), measures based on relative di-, tri-, and tetra-nucleotide frequencies as in Willner et al (2009), as well as standard lp measures between the frequency vectors, for the comparison of metagenomic samples using sequence signatures We compared their performance using a series of extensive simulations and three real next-generation sequencing (NGS) metagenomic datasets: 39 fecal samples from 33 mammalian host species, 56 marine samples across the world, and 13 fecal samples from human individuals. Taxon-based methods, on the other hand, calculate beta-diversity through binning sequences to Operational Taxonomic Units (OTUs), or assigning sequences to, for example, species or genera, and comparing samples by counting overlaps in the taxa [24,25,26]

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Genomics	Publication Date: Dec 1, 2012
Citations: 122	License type: cc-by

R Discovery Prime

R Discovery Prime

Comparison of metagenomic samples using sequence signatures.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics

Lead the way for us

Similar Papers

Comparison of Metatranscriptomic Samples Based on k-Tuple Frequencies
Ying Wang ... Ting Chen
PLoS ONE | VOL. 9
Ying Wang, et. al.Ying Wang ... Ting Chen
02 Jan 2014
PLoS ONE | VOL. 9

A Probabilistic Approach to Accurate Abundance-Based Binning of Metagenomic Reads
Olga Tanaseichuk ... James Borneman
-
Olga Tanaseichuk, et. al.Olga Tanaseichuk ... James Borneman
01 Jan 2012
01 Jan 2012

Author response: Tiled-ClickSeq for targeted sequencing of complete coronavirus genomes with simultaneous capture of RNA recombination and minority variants
Elizabeth Jaworski ...
-
Elizabeth Jaworski, et. al.Elizabeth Jaworski ...
03 Sep 2021
03 Sep 2021

MetaObtainer: A Tool for Obtaining Specified Species from Metagenomic Reads of Next-generation Sequencing.
Weihua Pan ... Yun Xu
Interdisciplinary sciences, computational life sciences | VOL. 7
Weihua Pan, et. al.Weihua Pan ... Yun Xu
21 Aug 2015
Interdisciplinary sciences, computational life sciences | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparison of metagenomic samples using sequence signatures.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Genomics