Powerful and robust non-parametric association testing for microbiome data via a zero-inflated quantile approach (ZINQ)

Wodan Ling,Katie A Meyer,Michael C Wu,Anthony A Fodor,Ni Zhao,Lenore J Launer,Anna M Plantinga

doi:10.1186/s40168-021-01129-3

Abstract

BackgroundIdentification of bacterial taxa associated with diseases, exposures, and other variables of interest offers a more comprehensive understanding of the role of microbes in many conditions. However, despite considerable research in statistical methods for association testing with microbiome data, approaches that are generally applicable remain elusive. Classical tests often do not accommodate the realities of microbiome data, leading to power loss. Approaches tailored for microbiome data depend highly upon the normalization strategies used to handle differential read depth and other data characteristics, and they often have unacceptably high false positive rates, generally due to unsatisfied distributional assumptions. On the other hand, many non-parametric tests suffer from loss of power and may also present difficulties in adjusting for potential covariates. Most extant approaches also fail in the presence of heterogeneous effects. The field needs new non-parametric approaches that are tailored to microbiome data, robust to distributional assumptions, and powerful under heterogeneous effects, while permitting adjustment for covariates.MethodsAs an alternative to existing approaches, we propose a zero-inflated quantile approach (ZINQ), which uses a two-part quantile regression model to accommodate the zero inflation in microbiome data. For a given taxon, ZINQ consists of a valid test in logistic regression to model the zero counts, followed by a series of quantile rank-score based tests on multiple quantiles of the non-zero part with adjustment for the zero inflation. As a regression and quantile-based approach, the method is non-parametric and robust to irregular distributions, while providing an allowance for covariate adjustment. Since no distributional assumptions are made, ZINQ can be applied to data that has been processed under any normalization strategy.ResultsThorough simulations based on real data across a range of scenarios and application to real data sets show that ZINQ often has equivalent or higher power compared to existing tests even as it offers better control of false positives.ConclusionsWe present ZINQ, a quantile-based association test between microbiota and dichotomous or quantitative clinical variables, providing a powerful and robust alternative for the current microbiome differential abundance analysis.9TU_kfqMVHfUkABshrfuDQVideo

Highlights

Identification of bacterial taxa associated with diseases, exposures, and other variables of interest offers a more comprehensive understanding of the role of microbes in many conditions
We considered conducting log-ratio (CLR) normalization that removes compositionality of microbiome data, and used applicable methods, LDM, linear regression and QRank to compare with zero-inflated quantile approach (ZINQ)
ZINQ is even more powerful than zero-inflated negative binomial (ZINB) in Setting 2, with more than 83% of the true differentially abundant taxa detected, compared to 73% for ZINB

Summary

Introduction

Identification of bacterial taxa associated with diseases, exposures, and other variables of interest offers a more comprehensive understanding of the role of microbes in many conditions. A recurring objective of human microbiome profiling studies is to identify individual bacterial taxa that are associated with experimental conditions, exposures, or other outcome variables of interest. Such trait-associated taxa (referred to as differentially abundant taxa, for simplicity) can provide clues to the biological mechanisms underlying conditions or responses and facilitate followup investigations of the impact of microorganisms on human diseases, leading to novel preventive or therapeutic strategies. Most differential abundance analysis approaches tailored towards genomic and microbiome data assume a statistical distribution for the transformed read counts, causing inflated false positive findings when these assumptions fail. Classical statistical methods such as Wilcoxon tests are conservative, controlling type I error but losing power since they fail to fully exploit the data characteristics

Objectives

Methods

Results

Discussion

Conclusion