Abstract

Sliced Inverse Regression (SIR) has been extensively used to reduce the dimension of the predictor space before performing regression. SIR is originally a model free method but it has been shown to actually correspond to the maximum likelihood of an inverse regression model with Gaussian errors. This intrinsic Gaussianity of standard SIR may explain its high sensitivity to outliers as observed in a number of studies. To improve robustness, the inverse regression formulation of SIR is therefore extended to non-Gaussian errors with heavy-tailed distributions. Considering Student distributed errors it is shown that the inverse regression remains tractable via an Expectation–Maximization (EM) algorithm. The algorithm is outlined and tested in the presence of outliers, both in simulated and real data, showing improved results in comparison to a number of other existing approaches.

Highlights

  • Let us consider a regression setting where the goal is to estimate the relationship between a univariate response variable Y and a predictor X

  • The result in Proposition 6 of [9] is extended from Gaussian to Student errors showing that the inverse regression approach of Sliced Inverse Regression (SIR) is still valid outside the Gaussian case, meaning that the central subspace can still be estimated by maximum likelihood estimation of the inverse regression parameters

  • In all cases and tables, the different methods performance is assessed based on their ability to recover the central subspace which is measured via the value of the proximity measure r (26)

Read more

Summary

Introduction

Let us consider a regression setting where the goal is to estimate the relationship between a univariate response variable Y and a predictor X. The inverse regression approach to dimensionality reduction gained rapid attention [8] and was generalized in [9] which shows the link between the axes spanning the central subspace and an inverse regression problem with Gaussian distributed errors. The result in Proposition 6 of [9] is extended from Gaussian to Student errors showing that the inverse regression approach of SIR is still valid outside the Gaussian case, meaning that the central subspace can still be estimated by maximum likelihood estimation of the inverse regression parameters.

Multivariate generalized Student distributions
Student Sliced Inverse Regression
Maximum likelihood estimation via EM algorithm
Connection to Sliced Inverse Regression
Determination of the central subspace dimension
Simulation study
Simulation setup Three different regression models are considered:
Real data application
Evaluation setting
Method SIR
Conclusion and future work
Proof of Proposition 1
Proof of Lemma 1
Proof of Proposition 2
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call