Variable selection methods applied to the mathematics scores of Indonesian students based on convex penalized likelihood

V M Santi,B Sartono,K A Notodiputro

doi:10.1088/1742-6596/1402/7/077096

V M Santi, B Sartono + Show 1 more

Open Access

https://doi.org/10.1088/1742-6596/1402/7/077096

Copy DOI

Abstract

Variable selection is an important topic in linear regression analysis. In practice, a large number of predictors usually are introduced at the initial stage of modeling to attenuate possible modeling biases. stepwise deletion and subset selection are usually used which can be computationally expensive and ignore stochastic errors in the variable selection process. In addition, the best subset selection of variables suffers from several disadvantages, the most severe of which is its a lack of stability. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Some of penalty functions are used to produce sparse solutions. Based on the RMSE and Generalized Information Criterion (GIC) criteria, it was found that the factors affecting Indonesian mathematics scores, where LASSO produces 11 important variables for the model while SCAD has 6 variables which mean that the LASSO model is more complex than SCAD. The MCP produces a simpler model with 5 important variables but has excessive biassed. The results also showed that the SCAD penalty function had the best performance compared to LASSO, Ridge and MCP. Ridge penalty has a worst performance based on all criteria.

Full Text