Abstract

In the theory of sufficient dimension reduction, Sliced Inverse Regression (SIR) is a famous technique that enables us to reduce the dimensionality of regression problems. This semiparametric regression method aims at determining linear combinations of a p -dimensional explanatory variable x related to a response variable y . However it is based on a crucial condition on the marginal distribution of the predictor x , often called the linearity condition. From a theoretical and practical point of view, this condition appears to be a limitation. Using an idea of Li, Cook, and Nachtsheim (2004) in the Ordinary Least Squares framework, we propose in this article to cluster the predictor space so that the linearity condition approximately holds in the different partitions. Then we apply SIR in each cluster and finally estimate the dimension reduction subspace by combining these individual estimates. We give asymptotic properties of the corresponding estimator. We show with a simulation study that the proposed approach, referred as cluster-based SIR, improves the estimation of the e.d.r. basis. We also propose an iterative implementation of cluster-based SIR and show in simulations that it increases the quality of the estimator. Finally the methodology is applied on the horse mussel data and the comparison of the prediction reached on test samples shows the superiority of cluster-based SIR over SIR.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call