Abstract

In this paper we develop a sure independence screening method based on hypothesis testing (HT-SIS) in a general nonparametric regression model. The ranking utility is based on a powerful test statistic for the hypothesis of predictive significance of each available covariate. The sure screening property of HT-SIS is established, demonstrating that all active predictors will be retained with high probability as the sample size increases. The threshold parameter is chosen in a theoretically justified manner based on the desired false positive selection rate. Simulation results suggest that the proposed method performs competitively against procedures found in the literature of screening for several models, and outperforms them in some scenarios. A real dataset of microarray gene expressions is analyzed.

Highlights

  • In recent years, fast advances in technology and data collection have facilitated the acquisition of high-dimensional data in several areas of research

  • Based on a sample of n iid observations from model (1), we propose ranking the utility of the covariates using, marginally, the test statistic introduced by Zambom and Akritas (2014) [20]

  • In this paper we propose a screening method based on a test statistic for the hypothesis that a covariate is influential in the prediction of the response variable

Read more

Summary

Introduction

Fast advances in technology and data collection have facilitated the acquisition of high-dimensional data in several areas of research. The theoretical properties of this procedure were obtained under the strong assumption of a linear model If this assumption is not accurate, predictors with high predictive significance whose effects are nonlinear might not be detected. In order to identify nonlinear effects in a regression model, Fan, Feng and Song (2011) [11] considered nonparametric independence screening (NIS) with an additive model, ranking the utility of the covariates with Em2j (Xj), where mj =. In this paper we propose a novel screening method that, differently from the focus of the procedures in the literature, is based on a test statistic for the hypothesis that each available predictor has predictive significance. The proposed method is performed under a very general heteroscedastic nonparametric regression model, which does not require strong assumptions such as linearity or additivity of the mean regression function.

The model and preliminary results
The screening procedure and main results
The choice of the threshold parameter
Simulation study
Real data application
Discussion
Auxiliary lemmas
Findings
Proofs of lemmas and theorems
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call