Abstract

Reduction of the high dimensional binary classification data using penalized logistic regression is one of the challenges when the explanatory variables are correlated. To tackle both estimate the coefficients and perform variable selection simultaneously, elastic net penalty was successfully applied in high dimensional binary classification. However, elastic net has two major limitations. First it does not encouraging grouping effects when there is no high correlation. Second, it is not consistent in variable selection. To address these issues, an adjusted of the elastic net (AEN) and its adaptive adjusted elastic net (AAEM), are proposed to take into account the small and medium correlation between explanatory variables and to provide the consistency of the variable selection simultaneously. Our simulation and real data results show that AEN and AAEN has advantage with small, medium, and extremely correlated variables in terms of both prediction and variable selection consistency comparing with other existing penalized methods.

Highlights

  • With the advancement of technologies, massive amount of data with increasing dimensions have been generated in many areas such as genetics, medical, economic and social sciences

  • It is clearly seen that adjusted of the elastic net (AEN) and AAEN has less variability comparing with elastic net

  • In terms of false positive (FP), AAEN and AEN methods usually select less ineffective variables than elastic net in most cases. It is obvious from our simulation results that the AAEN and AEN methods perform better in term of missclassification errors for the test data (MEt) by obtaining smaller values, hits, and FP followed by elastic net for small, medium, and extremely high correlation and has greater advantage of variable selection with grouping effects in logistic regression model

Read more

Summary

Introduction

With the advancement of technologies, massive amount of data with increasing dimensions have been generated in many areas such as genetics, medical, economic and social sciences. “High dimensional data” refers to the situation where the number of variables measured is greater than the number of observations in the data. This differs from traditional datasets for statistical analysis where we have many observations on a few variables. The least absolute shrinkage and selection operator (LASSO) was proposed by Tibshirani (1996) to estimate the regression coefficients through 1 -norm penalty. An adjusted of the elastic net (AEN) and its adaptive adjusted elastic net (AAEM), are proposed to take into account the small and medium correlation between explanatory variables and to provide the consistency of the variable selection simultaneously.

Penalized Logistic Regression Model
Adjusted Elastic Net Penalty
Simulation Study
Simulation Results
Real Data Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call