Conditional Sure Independence Screening

Emre Barut,Jianqing Fan,Anneleen Verhasselt

doi:10.1080/01621459.2015.1092974

Abstract

ABSTRACTIndependence screening is powerful for variable selection when the number of variables is massive. Commonly used independence screening methods are based on marginal correlations or its variants. When some prior knowledge on a certain important set of variables is available, a natural assessment on the relative importance of the other predictors is their conditional contributions to the response given the known set of variables. This results in conditional sure independence screening (CSIS). CSIS produces a rich family of alternative screening methods by different choices of the conditioning set and can help reduce the number of false positive and false negative selections when covariates are highly correlated. This article proposes and studies CSIS in generalized linear models. We give conditions under which sure screening is possible and derive an upper bound on the number of selected variables. We also spell out the situation under which CSIS yields model selection consistency and the properties of CSIS when a data-driven conditioning set is used. Moreover, we provide two data-driven methods to select the thresholding parameter of conditional screening. The utility of the procedure is illustrated by simulation studies and analysis of two real datasets. Supplementary materials for this article are available online.

Highlights

Statisticians are nowadays frequently confronted with massive data sets from various frontiers of scientific research
A natural assessment on the relative importance of the other predictors is the conditional contributions of the individual predictors in presence of the known set of variables. This results in conditional sure independence screening (CSIS)
Over the last ten years, there has been many exciting developments in statistics and machine learning on variable selection techniques for ultrahigh dimensional feature space

Summary

INTRODUCTION

Statisticians are nowadays frequently confronted with massive data sets from various frontiers of scientific research. Consider the linear model (1) again with sparse regression coefficients β⋆ = (10, 0, · · · , 0, 1)T , equi-correlation 0.9 among all covariates except X2000, which is independent of the rest of the covariates By using the conditional screening approach in which the covariate X1 is conditioned upon (used in the joint fit), marginal utilities of the spurious variables are significantly reduced. The distributions of the average of the magnitude of the conditional fitted coefficients {|βCMj |}1j=9929 and |βCM2000| are shown in the middle panel of Figure 2. As shown by Fan and Lv (2008) and Fan and Song (2010), for a given threshold of marginal utility, the size of the selected variables depends on the correlation among covariates, as measured by the largest eigenvalue of Σ: λmax (Σ).

Generalized Linear Models

Conditional Screening

SURE SCREENING PROPERTIES

Properties on Population Level

Properties on Sample Level

SELECTION OF THE THRESHOLDING PARAMETER

Controlling FDR

Random Decoupling

Simulation Study

Normal model

Binomial model

Robustness of CSIS

Leukemia Data

Financial Data

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

The Fisher information

Proof of Theorem 4

Findings

Proof of Theorem 5

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the American Statistical Association	Publication Date: Jul 2, 2016
Citations: 101	License type: cc-by

R Discovery Prime

R Discovery Prime

Conditional Sure Independence Screening

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the American Statistical Association

Lead the way for us

Similar Papers

Ultra-high dimensional variable screening via Gram–Schmidt orthogonalization
Huiwen Wang ... Shanshan Wang
Computational Statistics | VOL. 35
Huiwen Wang, et. al.Huiwen Wang ... Shanshan Wang
07 Feb 2020
Computational Statistics | VOL. 35

Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models
Jianqing Fan ... Rui Song
Journal of the American Statistical Association | VOL. 106
Jianqing Fan, et. al.Jianqing Fan ... Rui Song
01 Jun 2011
Journal of the American Statistical Association | VOL. 106

An iterative approach to distance correlation-based sure independence screening
Wei Zhong ... Liping Zhu
Journal of Statistical Computation and Simulation | VOL. 85
Wei Zhong, et. al.Wei Zhong ... Liping Zhu
13 Jun 2014
Journal of Statistical Computation and Simulation | VOL. 85

Empirical Likelihood Test for High Dimensional Generalized Linear Models
Yangguang Zang ... Sanguo Zhang
-
Yangguang Zang, et. al.Yangguang Zang ... Sanguo Zhang
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Conditional Sure Independence Screening

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the American Statistical Association