Abstract

Microbiome data consists of operational taxonomic unit (OTU) counts characterized by zero-inflation, over-dispersion, and grouping structure among samples. Currently, statistical testing methods are commonly performed to identify OTUs that are associated with a phenotype. The limitations of statistical testing methods include that the validity of p-values/q-values depend sensitively on the correctness of models and that the statistical significance does not necessarily imply predictivity. Predictive analysis using methods such as LASSO is an alternative approach for identifying associated OTUs and for measuring the predictability of the phenotype variable with OTUs and other covariate variables. We investigate three strategies of performing predictive analysis: (1) LASSO: fitting a LASSO multinomial logistic regression model to all OTU counts with specific transformation; (2) screening+GLM: screening OTUs with q-values returned by fitting a GLMM to each OTU, then fitting a GLM model using a subset of selected OTUs; (3) screening+LASSO: fitting a LASSO to a subset of OTUs selected with GLMM. We have conducted empirical studies using three simulation datasets generated using Dirichlet-multinomial models and a real gut microbiome data related to Parkinson’s disease to investigate the performance of the three strategies for predictive analysis. Our simulation studies show that the predictive performance of LASSO with appropriate variable transformation works remarkably well on zero-inflated data. Our results of real data analysis show that Parkinson’s disease can be predicted based on selected OTUs after the binary transformation, age, and sex with high accuracy (Error Rate = 0.199, AUC = 0.872, AUPRC = 0.912). These results provide strong evidences of the relationship between Parkinson’s disease and the gut microbiome.

Highlights

  • The microbiome comprises all of the genetic material within a microbiota

  • We consider three strategies of predictive analysis for microbiome data: (1) fitting a LASSO multinomial logistic regression (LASSO-MLR) model to all operational taxonomic unit (OTU) counts with specific transformation, shortened by LASSO; (2) screening OTUs with q-values returned by fitting a Generalized linear mixed effect models (GLMMs) to each OTU, fitting a generalized linear models (GLMs) model using a subset of selected OTUs, shortened by screening+GLM; (3) fitting a LASSO-MLR to a subset of OTUs selected with GLMM, shortened by screening+LASSO

  • The test set is used to check the predictive performance of selected features through GLM or LASSO-MLR

Read more

Summary

Introduction

The microbiome comprises all of the genetic material within a microbiota (the entire collection of microorganisms in a specific niche, such as the human gut). Predictive analysis methods for human microbiome data (FUNDER number: RGPIN-2019-07020) and Canada Foundation of Innovations (CFI). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call