Abstract
The analyses of large volumes of metagenomic data extracted from aggregate populations of microscopic organisms residing on and in the human body are advancing contemporary understandings of the integrated participation of microbes in human health and disease. Next generation sequencing technology facilitates said analyses in terms of diversity, community composition, and differential abundance by filtering and binning microbial 16S rRNA genes extracted from human tissues into operational taxonomic units. However, current statistical tools restrict study designs to investigations of limited numbers of host characteristics mediated by limited numbers of samples potentially yielding a loss of relevant information. This paper presents a Bayesian hierarchical negative binomial model as an efficient technique capable of compensating for multivariable sets including tens or hundreds of host characteristics as covariates further expanding analyses of human microbiome count data. Simulation studies reveal that the Bayesian hierarchical negative binomial model provides a desirable strategy by often outperforming three competing negative binomial model in terms of type I error while simultaneously maintaining consistent power. An application of the Bayesian hierarchical negative binomial model using subsets of the open data published by the American Gut Project demonstrates an ability to identify operational taxonomic units significantly differentiable among persons diagnosed by a medical professional with either inflammatory bowel disease or irritable bowel syndrome that are consistent with contemporary gastrointestinal literature.
Highlights
ObjectivesOur aim is to assess whether operational taxonomic units (OTUs) are significantly differentiable among subjects identified to have a disease or condition of interest in comparison to healthy controls by explicitly adjusting for dependencies between covariates
The Bayesian hierarchical negative binomial (HNB) model avoids these steps allowing for the direct modeling of raw counts through the incorporation of library sizes as a modeling offset
We have shown the proposed method is capable of simultaneously adjusting for multivariable sets of tens or hundreds of clinical, physiological, environmental, behavioral, demographic, and/or genetic sample host characteristics, which is not always attainable by the classical negative binomial (NB) model implemented in MASS or modified NB models implemented in edgeR or DESeq2 when the number of samples is restricted
Summary
Our aim is to assess whether OTUs are significantly differentiable among subjects identified to have a disease or condition of interest in comparison to healthy controls by explicitly adjusting for dependencies between covariates. We aim to highlight OTUs known to be significantly associated with the stated diseases while adjusting for numerous host characteristics such as dietary behaviors and systemic practices as covariates. Similar to many existing methods, we aim to determine whether the abundance of a microbial taxon is statistically associated with host characteristics when testing features is completed one-by-one
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.