Abstract

BackgroundRecent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. In addition to the well-known properties of microbiome count measurements, for example, varied total sequence reads across samples, over-dispersion and zero-inflation, microbiome studies usually collect samples with hierarchical structures, which introduce correlation among the samples and thus further complicate the analysis and interpretation of microbiome count data.ResultsIn this article, we propose negative binomial mixed models (NBMMs) for detecting the association between the microbiome and host environmental/clinical factors for correlated microbiome count data. Although having not dealt with zero-inflation, the proposed mixed-effects models account for correlation among the samples by incorporating random effects into the commonly used fixed-effects negative binomial model, and can efficiently handle over-dispersion and varying total reads. We have developed a flexible and efficient IWLS (Iterative Weighted Least Squares) algorithm to fit the proposed NBMMs by taking advantage of the standard procedure for fitting the linear mixed models.ConclusionsWe evaluate and demonstrate the proposed method via extensive simulation studies and the application to mouse gut microbiome data. The results show that the proposed method has desirable properties and outperform the previously used methods in terms of both empirical power and Type I error. The method has been incorporated into the freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/ and http://github.com/abbyyan3/BhGLM), providing a useful tool for analyzing microbiome data.

Highlights

  • Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data

  • High-throughput microbiome datasets generated by the 16S ribosome RNA gene sequencing or shotgun metagenomic sequencing have some properties that require tailored analytic tools; these include count compositional structure, varied total sequence reads across samples, overdispersion and zero-inflation

  • We propose negative binomial mixed models (NBMMs) for directly modeling the raw microbiome count data, which bypasses the need for transformation

Read more

Summary

Introduction

Recent advances in next-generation sequencing (NGS) technology enable researchers to collect a large volume of metagenomic sequencing data. These data provide valuable resources for investigating interactions between the microbiome and host environmental/clinical factors. The advent of next-generation sequencing (NGS) technology enables the generation of large volume of metagenomic sequencing data at moderate cost [1,2,3] This opens a new era of metagenomics studies to explore microbial communities sampled directly from the environments without need for cultivation [4,5,6]. Several zero-inflated models have been proposed to correct for excess zero counts in microbiome measurements, including zeroinflated Gaussian, lognormal, negative bimomial and beta models [25, 29,30,31,32]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call