Abstract
The human microbiome is a collection of microorganisms. They form complex communities and collectively affect host health. Recently, the advances in next-generation sequencing technology enable the high-throughput profiling of the human microbiome. This calls for a statistical model to construct microbial networks from the microbiome sequencing count data. As microbiome count data are high-dimensional and suffer from uneven sampling depth, over-dispersion, and zero-inflation, these characteristics can bias the network estimation and require specialized analytical tools. Here we propose a general framework, HARMONIES, Hybrid Approach foR MicrobiOme Network Inferences via Exploiting Sparsity, to infer a sparse microbiome network. HARMONIES first utilizes a zero-inflated negative binomial (ZINB) distribution to model the skewness and excess zeros in the microbiome data, as well as incorporates a stochastic process prior for sample-wise normalization. This approach infers a sparse and stable network by imposing non-trivial regularizations based on the Gaussian graphical model. In comprehensive simulation studies, HARMONIES outperformed four other commonly used methods. When using published microbiome data from a colorectal cancer study, it discovered a novel community with disease-enriched bacteria. In summary, HARMONIES is a novel and useful statistical framework for microbiome network inference, and it is available at https://github.com/shuangj00/HARMONIES.
Highlights
Microbiota form complex community structures and collectively affect human health
As the real microbiome data are characterized by zero-inflation and over-dispersion, we model yij through a zero-inflated negative binomial (ZINB) model as yij ∼ πiI(yij = 0) + (1 − πi)NB(λij, φj)
We introduce HARMONIES as a statistical framework to infer sparse networks using microbiome sequencing data
Summary
Microbiota form complex community structures and collectively affect human health. Studying their relationship as a network can provide key insights into their biological mechanisms. Each sample was transformed by a choice of log-ratio transformations to remove the unitsum constraint of the compositional data While this type of normalization is simple to implement and preserves the original ordering of the counts in a sample, it fails to capture the sample to sample variation and it overlooks the excess zeros in the microbiome data. Kurtz et al (2015) proposed a statistical model for inferring microbial ecological network, which is based on estimating the precision matrix (via exploiting sparsity) of a Gaussian multivariate model and relies on graphical lasso (Glasso) (Friedman et al, 2008) Their data normalization step needs to be improved to account for unique characteristics observed in microbiome count data. The R package HARMONIES is freely available at https://github.com/shuangj00/HARMONIES
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have