The Penalized Regression and Penalized Logistic Regression of Lasso and Elastic Net Methods for High- Dimensional Data: A Modelling Approach

Autcha Araveeporn

doi:10.9734/bpi/ist/v3/1695b

Abstract

The objective of this research is to compare the parameter estimation of penalized regression and penalized logistic regression using the lasso, elastic net, adaptive lasso, and adaptive elastic net methods on high-dimensional data. The parameter estimation of the multiple linear regression model is an important problem in two related variables consisting of dependent and independent variables. Usually, the number of independent variables is less than the number of sample sizes, so the ordinary least squares give a unique solution. However, the number of independent variables is larger than a number of sample sizes, which is called the high-dimensional data. The traditional regression analysis does not estimate the solution to this problem in the case of high-dimensional data.  To overcome this problem, penalized regression analysis concerns to solve high-dimensional data. The computational part focuses on estimating the lasso, adaptive lasso, elastic net, and adaptive elastic net methods called penalized regression analysis. Lasso (least absolute shrinkage and selection operator) is added the penalty term as the scaled sum of the absolute value of the coefficients. The elastic net mixes between ridge regression and lasso on the penalty term. The lasso and elastic net methods can shrink the coefficients for variable selection. The adaptive lasso and elastic net methods use the adaptive weights on the penalty term based on the lasso and elastic net estimates. The adaptive weight is related to the power order of the estimator. Commonly, these methods focus on estimating parameters in linear regression models based on the dependent variable and independent variable as a continuous scale.  Moreover, these methods can apply the penalized regression based on logistic regression to classify high-dimensional data. The classification is used to classify the categorical data for dependent variables dependent on the independent variables, called the penalized logistic regression model. The categorical data are considered a binary variable, and the independent variables are used as the continuous variable. In this case, the independent variables are generated from the normal distribution on several variances at 20, 30, 40, and 50 when the sample sizes are less than the independent variables. For penalized regression, the comparison criterion is the average mean square error. The average percentage of accuracy is used to compare penalized logistic regression performance.

Full Text