Abstract

We propose a robust procedure to estimate a linear regression model with compositional and real-valued explanatory variables. The proposed procedure is designed to be robust against individual outlying cells in the data matrix (cellwise outliers), as well as entire outlying observations (rowwise outliers). Cellwise outliers are first filtered and then imputed by robust estimates. Afterwards, rowwise robust compositional regression is performed to obtain model coefficient estimates. Simulations show that the procedure generally outperforms a traditional rowwise-only robust regression method (MM-estimator). Moreover, our procedure yields better or comparable results to recently proposed cellwise robust regression methods (shooting S-estimator, 3-step regression) while it is preferable for interpretation through the use of appropriate coordinate systems for compositional data. An application to bio-environmental data reveals that the proposed procedure—compared to other regression methods—leads to conclusions that are best aligned with established scientific knowledge.

Highlights

  • Regression analysis is one of the most widely used techniques in practical data analysis and statistical modelling

  • The task is to transfer the information about the cellwise outliers in L to X = (x1, . . . , x D, r1, . . . , r p+1). While this is identical for the real-valued variables, we propose to mark a compositional part xi j in X as a cellwise outlier if at least half of the logratios containing xi j are identified as outliers by the bivariate filter

  • The mean squared error (MSE) of ideal filter (IF)-multiple imputation (MI) remains fairly low for 10% contamination, which indicates that the outlier filtering step is crucial for the performance of our proposed method, but under 20% contamination the MSE of IF-MI increases as well

Read more

Summary

Introduction

Regression analysis is one of the most widely used techniques in practical data analysis and statistical modelling It allows to study how a real-valued response variable is associated with explanatory variables of various types, including variables of a compositional nature (i.e., variables that carry relative information). We introduce a robust estimation procedure for regression analysis with compositional covariates that is designed to handle both cellwise and rowwise outliers. The results indicate that our procedure, which maximizes the use of the information contained in the data set, can cope with moderate levels of cellwise and rowwise contamination, and yields better or comparable estimates than its competitors: the aforementioned 3-step regression estimator and shooting S-estimator, as well as the rowwise robust MMestimator and the ordinary least squares estimator.

Methodological background
Robust compositional regression with cellwise outliers
Detection of cellwise outliers
Imputation of cellwise outliers
Robust compositional regression
Multiple imputation estimates
Simulation design
Simulation results
Illustrative case study
Computation time
Conclusions and discussion
A Pseudocode of the BF-MI algorithm
B On using a separate imputation step
1: Cellwise outlier detection on pairwise logratios and real-valued variables
17: Adopt outlying cells in real-valued variables from bivariate filter
20: Imputations in real-valued variables
1: Detect cellwise outliers 2
9: Replace cells of X with indices in O by missing values
Findings
41: Compute variance estimates
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call