Abstract

In this paper, we introduce a two-step procedure, in the context of ultrahigh-dimensional additive models, to identify nonzero and linear components. We first develop a sure independence screening procedure based on the distance correlation between predictors and marginal distribution function of the response variable to reduce the dimensionality of the feature space to a moderate scale. Then a double penalization based procedure is applied to identify nonzero and linear components, simultaneously. We conduct extensive simulation experiments to evaluate the numerical performance of the proposed method and analyze a cardiomyopathy microarray data for an illustration. Numerical studies confirm the fine performance of the proposed method for various semiparametric models.

Highlights

  • Suppose we have a random sample, 1 ≤ i ≤ n, where yi is the response variable and is a p-dimensional covariate vector

  • We propose a more robust approach, called robust distance correlation sure independence screening (RDC-SIS), to reduce dimensionality which ranks each covariate through its distance correlation with the marginal distribution function of the response variable

  • We propose a robust feature screening procedure for model (2) using distance correlation between predictors and marginal distribution function of response variable

Read more

Summary

Introduction

Suppose we have a random sample (yi, xi1, . . . , xip), 1 ≤ i ≤ n, where yi is the response variable and (xi1, . . . , xip) is a p-dimensional covariate vector. We propose a more robust approach, called robust distance correlation sure independence screening (RDC-SIS), to reduce dimensionality which ranks each covariate through its distance correlation with the marginal distribution function of the response variable This method is model-free and we can expect that the procedure works well for skew or heavy tailed response variable. We propose a robust feature screening procedure for model (2) using distance correlation between predictors and marginal distribution function of response variable. For two univariate normal random variables U and V , with the Pearson correlation coefficient ρ, Szekely, Rizzo, and Bakirov [18] showed that dcorr(U, V ) is strictly increasing in |ρ| This property implies that the distance correlation based feature screening procedure is equivalent to the marginal Pearson correlation learning for linear regression with normally distributed predictors and random error.

Variable Selection and Structure Identification
Simulation Studies
RDC-SIS 5 5
Data Analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call