Abstract

In regression with a high-dimensional predictor vector, dimension reduction methods aim at replacing the predictor by a lower dimensional version without loss of information on the regression. In this context, the so-called central mean subspace is the key of dimension reduction. The last two decades have seen the emergence of many methods to estimate the central mean subspace. In this paper, we go one step further, and we study the performances of a k-nearest neighbor type estimate of the regression function, based on an estimator of the central mean subspace. In our setting, the predictor lies in ℝp with fixed p, i.e. it does not depend on the sample size. The estimate is first proved to be consistent. Improvement due to the dimension reduction step is then observed in term of its rate of convergence. All the results are distributions-free. As an application, we give an explicit rate of convergence using the SIR method. The method is illustrated by a simulation study.

Highlights

  • In a full generality, the goal of regression is to infer about the conditional law of the response variable Y given the Rp-valued predictor X

  • Several methods have been introduced to estimate this subspace: sliced inverse regression (SIR; Li [13]), sliced average variance estimation (SAVE; Cook and Weisberg [6]), average derivative estimation (ADE; Hardle and Stoker [10]), . . . See the paper by Cook and Weisberg [7] who gives an introductory account of studying regression via these methods

  • The central mean subspace, which exists under mild conditions, is the target of sufficient dimension reduction for the mean response E[Y |X]

Read more

Summary

Introduction

The goal of regression is to infer about the conditional law of the response variable Y given the Rp-valued predictor X. The central mean subspace, which exists under mild conditions (see Cook [1,2,3]), is the target of sufficient dimension reduction for the mean response E[Y |X]. Assuming the existence of a mean dimensionreduction subspace as in (1.1), we first construct in Section 2 the k-NN type estimator based on an estimate Λof Λ Speaking, it is defined as the kNN regression estimate drawn from the (Λ Xi, Yi)’s.

The estimator
Behavior of r
Construction of Λ
Rate of convergence
Theoretical results
Statistical methodology: the NN-SIR method
A small simulation study
Preliminaries
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call