Robust Variable and Interaction Selection for Logistic Regression and General Index Models

Yang Li,Jun S Liu

doi:10.1080/01621459.2017.1401541

Yang Li, Jun S Liu

Open Access

https://doi.org/10.1080/01621459.2017.1401541

Copy DOI

Abstract

ABSTRACTUnder the logistic regression framework, we propose a forward-backward method, SODA, for variable selection with both main and quadratic interaction terms. In the forward stage, SODA adds in predictors that have significant overall effects, whereas in the backward stage SODA removes unimportant terms to optimize the extended Bayesian information criterion (EBIC). Compared with existing methods for variable selection in quadratic discriminant analysis, SODA can deal with high-dimensional data in which the number of predictors is much larger than the sample size and does not require the joint normality assumption on predictors, leading to much enhanced robustness. We further extend SODA to conduct variable selection and model fitting for general index models. Compared with existing variable selection methods based on the sliced inverse regression (SIR), SODA requires neither linearity nor constant variance condition and is thus more robust. Our theoretical analysis establishes the variable-selection consistency of SODA under high-dimensional settings, and our simulation studies as well as real-data applications demonstrate superior performances of SODA in dealing with non-Gaussian design matrices in both logistic and general index models. Supplementary materials for this article are available online.

Highlights

Classification, known as “supervised learning“, is a fundamental building block of statistical machine learning
We have reported in the Supplemental Materials a comparison between selection for Discriminant Analysis (SODA) and Lasso-logistic for variable selections when the underlying logistic regression model has only linear main effects, and found that SODA was competitive with Lasso in all cases we tested and out-performed Lasso significantly when the “incoherence” condition (Ravikumar et al, 2010) was violated
We study the variable and interaction selection for logistic regression with second-order terms, which covers QDA as a special case

Summary

Introduction

Classification, known as “supervised learning“, is a fundamental building block of statistical machine learning. We applied LDA, logistic regression, and QDA to train classifiers, and the classification accuracy was estimated by using 1000 additional testing samples generated from the Oracle model. Both LDA and logistic regression with only linear terms had poor prediction powers, whereas QDA improved the classification accuracy dramatically. A direct application of Lasso-logistic regression with all second-order terms is prohibitive for moderately large p (e.g., p ≥ 1000) To cope with this difficulty, Fan et al (2015) proposed innovated interaction screening (IIS) based on transforming the original

Method

Quadratic logistic regression and extended BIC

Stepwise variable and interaction selection

Preliminary main effect selection

Backward elimination

Post-selection prediction for continuous response

Implementation issues of SODA

Theoretical properties of SODA

Logistic regression with interactions

Continuous-response index models

Prediction of continuous surface

Real data analysis

Michigan lung cancer dataset

Ionosphere dataset

Pumadyn dataset

Concluding remarks

41 Figure 2

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of the American Statistical Association	Publication Date: Jun 28, 2018
Citations: 27	License type: cc-by

R Discovery Prime

R Discovery Prime

Robust Variable and Interaction Selection for Logistic Regression and General Index Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the American Statistical Association

Lead the way for us

Similar Papers

Variable selection for general index models via sliced inverse regression
Bo Jiang ... Jun S Liu
The Annals of Statistics | VOL. 42
Bo Jiang, et. al.Bo Jiang ... Jun S Liu
01 Oct 2014
The Annals of Statistics | VOL. 42

Preface
S Ejaz Ahmed
Applied Stochastic Models in Business and Industry | VOL. 35
S Ejaz AhmedS Ejaz Ahmed
01 Mar 2019
Applied Stochastic Models in Business and Industry | VOL. 35

Variable selection for high-dimensional incomplete data using horseshoe estimation with data augmentation
Yunxi Zhang ... Soeun Kim
Communications in Statistics - Theory and Methods | VOL. ahead-of-print
Yunxi Zhang, et. al.Yunxi Zhang ... Soeun Kim
20 Feb 2023
Communications in Statistics - Theory and Methods | VOL. ahead-of-print

Sliced Inverse Regression with Regularizations
Lexin Li ... Xiangrong Yin
Biometrics | VOL. 64
Lexin Li, et. al.Lexin Li ... Xiangrong Yin
26 Feb 2008
Biometrics | VOL. 64

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Variable and Interaction Selection for Logistic Regression and General Index Models

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of the American Statistical Association