Advances in next-generation sequencing technology have enabled the high-throughput profiling of metagenomes and accelerated microbiome studies. Recently, there has been a rise in quantitative studies that aim to decipher the microbiome co-occurrence network and its underlying community structure based on metagenomic sequence data. Uncovering the complex microbiome community structure is essential to understanding the role of the microbiome in disease progression and susceptibility. Taxonomic abundance data generated from metagenomic sequencing technologies are high-dimensional and compositional, suffering from uneven sampling depth, over-dispersion, and zero-inflation. These characteristics often challenge the reliability of the current methods for microbiome community detection. To study the microbiome co-occurrence network and perform community detection, we propose a generalized Bayesian stochastic block model that is tailored for microbiome data analysis where the data are transformed using the recently developed modified centered-log ratio transformation. Our model also allows us to leverage taxonomic tree information using a Markov random field prior. The model parameters are jointly inferred by using Markov chain Monte Carlo sampling techniques. Our simulation study showed that the proposed approach performs better than competing methods even when taxonomic tree information is non-informative. We applied our approach to a real urinary microbiome dataset from postmenopausal women. To the best of our knowledge, this is the first time the urinary microbiome co-occurrence network structure in postmenopausal women has been studied. In summary, this statistical methodology provides a new tool for facilitating advanced microbiome studies.
Read full abstract