Abstract
Researchers have increasingly employed family-based or longitudinal study designs to survey the roles of the human microbiota on diverse host traits of interest (e. g., health/disease status, medical intervention, behavioral/environmental factor). Such study designs are useful to properly control for potential confounders or the sensitive changes in microbial composition and host traits. However, downstream data analysis is challenging because the measurements within clusters (e.g., families, subjects including repeated measures) tend to be correlated so that statistical methods based on the independence assumption cannot be used. For the correlated microbiome studies, a distance-based kernel association test based on the linear mixed model, namely, correlated sequence kernel association test (cSKAT), has recently been introduced. cSKAT models the microbial community using an ecological distance (e.g., Jaccard/Bray-Curtis dissimilarity, unique fraction distance), and then tests its association with a host trait. Similar to prior distance-based kernel association tests (e.g., microbiome regression-based kernel association test), the use of ecological distances gives a high power to cSKAT. However, cSKAT is limited to handling Gaussian traits [e.g., body mass index (BMI)] and a single chosen distance measure at a time. The power of cSKAT differs a lot by which distance measure is used. However, choosing an optimal distance measure is challenging because of the unknown nature of the true association. Here, we introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., BMI), Binomial (e.g., disease status, treatment/placebo) or Poisson (e.g., number of tumors/treatments) traits. We further propose a data-driven adaptive test of GLMM-MiRKAT, namely, aGLMM-MiRKAT, so as to avoid the need to choose the optimal distance measure. Our extensive simulations demonstrate that aGLMM-MiRKAT is robustly powerful while correctly controlling type I error rates. We apply aGLMM-MiRKAT to real familial and longitudinal microbiome data, where we discover significant disparity in microbial community composition by BMI status and the frequency of antibiotic use. In summary, aGLMM-MiRKAT is a useful analytical tool with its broad applicability to diverse types of traits, robust power and valid statistical inference.
Highlights
The recent surge in next-generation sequencing technologies has dramatically advanced the human microbiome studies by enabling generic characterization of the microbes in the human body (Hamady and Knight, 2009; Caporaso et al, 2010; Thomas et al, 2012)
We introduce a distance-based kernel association test based on the generalized linear mixed model (GLMM), namely, GLMM-MiRKAT, to handle diverse types of traits, such as Gaussian (e.g., body mass index (BMI)), Binomial or Poisson traits
We introduced a distance-based kernel association test based on the generalized linear mixed model, GLMMMiRKAT, for correlated microbiome studies
Summary
The recent surge in next-generation sequencing technologies has dramatically advanced the human microbiome studies by enabling generic characterization of the microbes in the human body (Hamady and Knight, 2009; Caporaso et al, 2010; Thomas et al, 2012). We focus on the microbiome association studies which test the disparity in microbial community (e.g., bacterial kingdom) composition by a host trait of interest (e.g., health/disease status, clinical intervention, behavioral/environmental factor) (Li, 2015). Recent studies have found disparity in microbial community composition for a variety of health/disease status [e.g., obesity (Arslan, 2014), type I diabetes (Zhang et al, 2018a), type II diabetes (Qin et al, 2012), human immunodeficiency virus (Bandera et al, 2018), inflammatory bowel disease (Knights et al, 2013; Borren et al, 2018), and cancers (Zitvogel et al, 2015)], medical interventions [e.g., administration of antibiotics (Zhang et al, 2018a)], and behavioral/environmental factors [e.g., diet, residence, smoking and birth mode (Charlson et al, 2010; Liu et al, 2017)]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.