Abstract

We introduce the Robust Logistic Zero-Sum Regression (RobLZS) estimator, which can be used for a two-class problem with high-dimensional compositional covariates. Since the log-contrast model is employed, the estimator is able to do feature selection among the compositional parts. The proposed method attains robustness by minimizing a trimmed sum of deviances. A comparison of the performance of the RobLZS estimator with a non-robust counterpart and with other sparse logistic regression estimators is conducted via Monte Carlo simulation studies. Two microbiome data applications are considered to investigate the stability of the estimators to the presence of outliers. Robust Logistic Zero-Sum Regression is available as an R package that can be downloaded at https://github.com/giannamonti/RobZS.

Highlights

  • Over the past decade, the interest in understanding the importance of the role of the microbiome in human health has increased, especially in studies concerning the association of a medical status with the microbial communities, providing new ways to classify individuals, and to predict their disease risks (Qin et al 2010)

  • We introduce the Robust Logistic Zero-Sum Regression (RobLZS) estimator, which can be used for a two-class problem with high-dimensional compositional covariates

  • We illustrate the performance of our proposed estimator by applying it to two datasets related to human microbiome data: the first one is related to inflammatory bowel diseases (IBD) (Morgan et al 2012), and the second one is concerned with an application to Parkinson’s disease (PD) (Dong et al 2020)

Read more

Summary

Introduction

The interest in understanding the importance of the role of the microbiome in human health has increased, especially in studies concerning the association of a medical status with the microbial communities, providing new ways to classify individuals, and to predict their disease risks (Qin et al 2010). The resulting sequencing reads are vectors of bacterial taxa abundances, that generally are clustered into operational taxonomic units (OTUs) at different taxonomic levels The analysis of these data is a statistical and computational challenge as they are typically high-dimensional, sparse, zero inflated due to the presence of many rare taxa, and compositional (Gloor et al 2017). This paper considers logistic regression analysis of microbiome compositional data, with the aim to identify the bacterial taxa that are associated with a dichotomous response, such as a medical status of interest. Avella-Medina and Ronchetti (2017) proposed a robust penalized quasi-likelihood estimator for generalized linear models, Park and Konishi (2016) suggested a robust penalized logistic regression based on a weighted likelihood methodology, and Kurnaz et al (2018) adopted a trimmed elasticnet estimator for linear and logistic regression None of these options satisfy the zero-sum constraint. This paper presents a Robust Logistic Zero-Sum Regression (RobLZS) model with compositional explanatory variables.

Sparse logistic regression models with compositional covariates
The RobLZS estimator
Algorithm
Parameter selection
Simulations
Sampling schemes
Performance measures
Simulation results
Method
Applications to microbiome data
Results for the IBD data
Method Measures
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call