A latent allocation model for the analysis of microbial composition and disease

Ko Abe,Masaaki Hirayama,Kinji Ohno,Teppei Shimamura

doi:10.1186/s12859-018-2530-6

Abstract

BackgroundEstablishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data.ResultsThe purpose of this article is to develop a new probabilistic model, called BERnoulli and MUltinomial Distribution-based latent Allocation (BERMUDA), to address these problems. BERMUDA enables us to describe the differences in bacteria composition and a certain disease among samples. We also provide a simple and efficient learning procedure for the proposed model using an annealing EM algorithm.ConclusionWe illustrate the performance of the proposed method both through both the simulation and real data analysis. BERMUDA is implemented with R and is available from GitHub (https://github.com/abikoushi/Bermuda).

Highlights

Establishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology
One of the goals for case-control studies using microbiome data is to investigate whether cases differ from controls in term of the microbiome composition of a particular body ecosystems and which taxa are responsible for any differences observed [1]. (Here, we use the generic term “taxa” to denote a particular phylogenetic classification.) These studies present microbiome data are represented as count data using operational taxonomic units (OTUs)
We extract the associations between microbial composition and a specific disease by supposing that there exist L latent clusters that vary with microbial composition and the disease risk

Summary

Introduction

Establishing the relationship between microbiota and specific diseases is important but requires appropriate statistical methodology. A specialized feature of microbiome count data is the presence of a large number of zeros, which makes it difficult to analyze in case-control studies. Most existing approaches either add a small number called a pseudo-count or use probability models such as the multinomial and Dirichlet-multinomial distributions to explain the excess zero counts, which may produce unnecessary biases and impose a correlation structure taht is unsuitable for microbiome data. A common strategy to handle these excessive zeros is to add a small number called a pseudo-count. Weiss et al (2017) [3] noted that there is no clear consensus on how to choose that value Another common strategy to mitigate the effects of these excessive zeros is to use non-parametric statistical tests. Wagner et al (2011) [4] described a test statistic that combines the proportion of zeros in the data with the statistics on values other than 0

Objectives

Methods

Results

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2018
Citations: 5	License type: open-access

R Discovery Prime

R Discovery Prime

A latent allocation model for the analysis of microbial composition and disease

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

A new probabilistic retrieval model based on the dirichlet compound multinomial distribution
Zuobing Xu ... Ram Akella
-
Zuobing Xu, et. al.Zuobing Xu ... Ram Akella
20 Jul 2008
20 Jul 2008

Robust Co-clustering to Discover Toxicogenomic Biomarkers and Their Regulatory Doses of Chemical Compounds Using Logistic Probabilistic Hidden Variable Model.
Mohammad Nazmol Hasan ... Md Masud Rana
Frontiers in Genetics | VOL. 9
Mohammad Nazmol Hasan, et. al.Mohammad Nazmol Hasan ... Md Masud Rana
01 Nov 2018
Frontiers in Genetics | VOL. 9

Improving probabilistic information retrieval by modeling burstiness of words
Zuobing Xu ... Ram Akella
Information Processing and Management | VOL. 46
Zuobing Xu, et. al.Zuobing Xu ... Ram Akella
31 Dec 2009
Information Processing and Management | VOL. 46

Microbial Flora on the Hands of Health Care Personnel: Differences in Composition and Antibacterial Resistance
William A Horn ... James J Leyden
Infection Control & Hospital Epidemiology | VOL. 9
William A Horn, et. al.William A Horn ... James J Leyden
01 May 1988
Infection Control & Hospital Epidemiology | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A latent allocation model for the analysis of microbial composition and disease

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics