Abstract

Motivation: An important feature of microbiome count data is the presence of a large number of zeros. A common strategy to handle these excess zeros is to add a small number called pseudo-count (e.g., 1). Other strategies include using various probability models to model the excess zero counts. Although adding a pseudo-count is simple and widely used, as demonstrated in this paper, it is not ideal. On the other hand, methods that model excess zeros using a probability model often make an implicit assumption that all zeros can be explained by a common probability models. As described in this article, this is not always recommended as there are potentially three types/sources of zeros in a microbiome data. The purpose of this paper is to develop a simple methodology to identify and accomodate three different types of zeros and to test hypotheses regarding the relative abundance of taxa in two or more experimental groups. Another major contribution of this paper is to perform constrained (directional or ordered) inference when there are more than two ordered experimental groups (e.g., subjects ordered by diet or age groups or environmental exposure groups). As far as we know this is the first paper that addresses such problems in the analysis of microbiome data.Results: Using extensive simulation studies, we demonstrate that the proposed methodology not only controls the false discovery rate at a desired level of significance while competing well in terms of power with DESeq2, a popular procedure derived from RNASeq literature. As expected, the method using pseudo-counts tends to be very conservative and the classical t-test that ignores the underlying simplex structure in the data has an inflated FDR.

Highlights

  • Microbial count data are represented using operational taxonomic units (OTUs) from 16S rRNA studies

  • The second is based on a negative binomial distribution, which is commonly used to model OTU count data of microbiome studies

  • One of the challenges when dealing with compositional microbiome data is the presence of a large frequency of zero counts

Read more

Summary

Introduction

Microbial count data are represented using operational taxonomic units (OTUs) from 16S rRNA studies. For each specimen (e.g. fecal sample) drawn from an ecosystem (e.g. gut), the number of occurrences of each OTU is measured and the resulting OTU table is summarized to obtain relative abundance for bacterial taxa in a specimen. These OTU counts may be summarized at any level of the bacterial phylogeny, e.g., species, genus, family, order, etc. Since the relative abundances of taxa in a specimen sum to 1, these are compositional data and they reside in a simplex rather than the entire Euclidean space. Another important feature of these microbiome data is that not all taxa

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call