Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems

Peter Hall,Hugh Miller

doi:10.1198/jcgs.2009.08041

Abstract

Using the traditional linear model to implement variable selection can perform very effectively in some cases, provided the response to relevant components is approximately monotone and its gradient changes only slowly. In other circumstances, nonlinearity of response can result in significant vector components being overlooked. Even if good results are obtained by linear model fitting, they can sometimes be bettered by using a nonlinear approach. These circumstances can arise in practice, with real data, and they motivate alternative methodologies. We suggest an approach based on ranking generalized empirical correlations between the response variable and components of the explanatory vector. This technique is not prediction-based, and can identify variables that are influential but not explicitly part of a predictive model. We explore the method’s performance for real and simulated data, and give a theoretical argument demonstrating its validity. The method can also be used in conjunction with, rather than as an alternative to, conventional prediction-based variable selections, by providing a preliminary “massive dimension reduction” step as a prelude to using alternative techniques (e.g., the adaptive lasso) that do not always cope well with very high dimensions. Supplemental materials relating to the numerical sections of this paper are available online.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics

Lead the way for us

Journal: Journal of Computational and Graphical Statistics	Publication Date: Jan 1, 2009
Citations: 175

Similar Papers

Genomic Prediction in Animals and Plants: Simulation of Data, Validation, Reporting, and Benchmarking
Hans D Daetwyler ... Ricardo Pong-Wong
Genetics | VOL. 193
Hans D Daetwyler, et. al.Hans D Daetwyler ... Ricardo Pong-Wong
01 Feb 2013
Genetics | VOL. 193

Adjusted Adaptive LASSO in High-dimensional Poisson Regression Model
Zakariya Y Algamal ... Muhammad H Lee
Modern Applied Science | VOL. 9
Zakariya Y Algamal, et. al.Zakariya Y Algamal ... Muhammad H Lee
11 Jan 2015
Modern Applied Science | VOL. 9

Two-step approach for assessing the health effects of environmental chemical mixtures: application to simulated datasets and real data from the Navajo Birth Cohort Study
Li Luo ... Laurie G Hudson
Environmental Health | VOL. 18
Li Luo, et. al.Li Luo ... Laurie G Hudson
09 May 2019
Environmental Health | VOL. 18

Extreme Rainfall Prediction using Bayesian Quantile Regression in Statistical Downscaling Modeling
Ro’Fah Nur Rachmawati ... Anita Rahayu
Procedia Computer Science | VOL. 157
Ro’Fah Nur Rachmawati, et. al.Ro’Fah Nur Rachmawati ... Anita Rahayu
01 Jan 2019
Procedia Computer Science | VOL. 157

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Generalized Correlation to Effect Variable Selection in Very High Dimensional Problems

Abstract

Talk to us

Similar Papers

More From: Journal of Computational and Graphical Statistics