Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes.

Viktor Jonsson,Tobias Österlund,Olle Nerman,Erik Kristiansson

doi:10.1089/cmb.2016.0180

Abstract

Metagenomics is the study of microorganisms in environmental and clinical samples using high-throughput sequencing of random fragments of their DNA. Since metagenomics does not require any prior culturing of isolates, entire microbial communities can be studied directly in their natural state. In metagenomics, the abundance of genes is quantified by sorting and counting the DNA fragments. The resulting count data are high-dimensional and affected by high levels of technical and biological noise that make the statistical analysis challenging. In this article, we introduce an hierarchical overdispersed Poisson model to explore the variability in metagenomic data. By analyzing three comprehensive data sets, we show that the gene-specific variability varies substantially between genes and is dependent on biological function. We also assess the power of identifying differentially abundant genes and show that incorrect assumptions about the gene-specific variability can lead to unacceptable high rates of false positives. Finally, we evaluate shrinkage approaches to improve the variance estimation and show that the prior choice significantly affects the statistical power. The results presented in this study further elucidate the complex variance structure of metagenomic data and provide suggestions for accurate and reliable identification of differentially abundant genes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes.

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Biology

Lead the way for us

Journal: Journal of Computational Biology	Publication Date: Apr 1, 2017
Citations: 22

Similar Papers

Modelling of zero-inflation improves inference of metagenomic gene count data.
Viktor Jonsson ... Olle Nerman
Statistical Methods in Medical Research | VOL. 28
Viktor Jonsson, et. al.Viktor Jonsson ... Olle Nerman
25 Nov 2018
Statistical Methods in Medical Research | VOL. 28

Comparison of normalization methods for the analysis of metagenomic gene abundance data
Mariana Buongermino Pereira ... Erik Kristiansson
BMC Genomics | VOL. 19
Mariana Buongermino Pereira, et. al.Mariana Buongermino Pereira ... Erik Kristiansson
20 Apr 2018
BMC Genomics | VOL. 19

Network construction and structure detection with metagenomic count data.
Zhenqiu Liu ... Steven Piantadosi
BioData Mining | VOL. 8
Zhenqiu Liu, et. al.Zhenqiu Liu ... Steven Piantadosi
01 Jun 2015
BioData Mining | VOL. 8

Statistical evaluation of methods for identification of differentially abundant genes in comparative metagenomics.
Viktor Jonsson ... Erik Kristiansson
BMC Genomics | VOL. 17
Viktor Jonsson, et. al.Viktor Jonsson ... Erik Kristiansson
25 Jan 2016
BMC Genomics | VOL. 17

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Variability in Metagenomic Count Data and Its Influence on the Identification of Differentially Abundant Genes.

Abstract

Talk to us

Similar Papers

More From: Journal of Computational Biology