Abstract

BackgroundEstablishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data.ResultsThe purpose of this article is to develop a new probabilistic model, called BERnoulli and MUltinomial Distribution-based latent Allocation (BERMUDA), to address these problems. BERMUDA enables us to describe the differences in bacteria composition and a certain disease among samples. We also provide a simple and efficient learning procedure for the proposed model using an annealing EM algorithm.ConclusionWe illustrate the performance of the proposed method both through both the simulation and real data analysis. BERMUDA is implemented with R and is available from GitHub (https://github.com/abikoushi/Bermuda).

Highlights

  • Establishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology

  • One of the goals for case-control studies using microbiome data is to investigate whether cases differ from controls in term of the microbiome composition of a particular body ecosystems and which taxa are responsible for any differences observed [1]. (Here, we use the generic term “taxa” to denote a particular phylogenetic classification.) These studies present microbiome data are represented as count data using operational taxonomic units (OTUs)

  • We extract the associations between microbial composition and a specific disease by supposing that there exist L latent clusters that vary with microbial composition and the disease risk

Read more

Summary

Introduction

Establishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data. A common strategy to handle these excessive zeros is to add a small number called a pseudo-count. Weiss et al (2017) [3] noted that there is no clear consensus on how to choose that value Another common strategy to mitigate the effects of these excessive zeros is to use non-parametric statistical tests. Wagner et al (2011) [4] described a test statistic that combines the proportion of zeros in the data with the statistics on values other than 0

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call