Abstract
We propose a robust procedure to estimate a linear regression model with compositional and real-valued explanatory variables. The proposed procedure is designed to be robust against individual outlying cells in the data matrix (cellwise outliers), as well as entire outlying observations (rowwise outliers). Cellwise outliers are first filtered and then imputed by robust estimates. Afterwards, rowwise robust compositional regression is performed to obtain model coefficient estimates. Simulations show that the procedure generally outperforms a traditional rowwise-only robust regression method (MM-estimator). Moreover, our procedure yields better or comparable results to recently proposed cellwise robust regression methods (shooting S-estimator, 3-step regression) while it is preferable for interpretation through the use of appropriate coordinate systems for compositional data. An application to bio-environmental data reveals that the proposed procedure—compared to other regression methods—leads to conclusions that are best aligned with established scientific knowledge.
Highlights
Regression analysis is one of the most widely used techniques in practical data analysis and statistical modelling
The task is to transfer the information about the cellwise outliers in L to X = (x1, . . . , x D, r1, . . . , r p+1). While this is identical for the real-valued variables, we propose to mark a compositional part xi j in X as a cellwise outlier if at least half of the logratios containing xi j are identified as outliers by the bivariate filter
The mean squared error (MSE) of ideal filter (IF)-multiple imputation (MI) remains fairly low for 10% contamination, which indicates that the outlier filtering step is crucial for the performance of our proposed method, but under 20% contamination the MSE of IF-MI increases as well
Summary
Regression analysis is one of the most widely used techniques in practical data analysis and statistical modelling It allows to study how a real-valued response variable is associated with explanatory variables of various types, including variables of a compositional nature (i.e., variables that carry relative information). We introduce a robust estimation procedure for regression analysis with compositional covariates that is designed to handle both cellwise and rowwise outliers. The results indicate that our procedure, which maximizes the use of the information contained in the data set, can cope with moderate levels of cellwise and rowwise contamination, and yields better or comparable estimates than its competitors: the aforementioned 3-step regression estimator and shooting S-estimator, as well as the rowwise robust MMestimator and the ordinary least squares estimator.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have