Abstract

Variable selection is an important topic in linear regression analysis. In practice, a large number of predictors usually are introduced at the initial stage of modeling to attenuate possible modeling biases. stepwise deletion and subset selection are usually used which can be computationally expensive and ignore stochastic errors in the variable selection process. In addition, the best subset selection of variables suffers from several disadvantages, the most severe of which is its a lack of stability. In this article, penalized likelihood approaches are proposed to handle these kinds of problems. The proposed methods select variables and estimate coefficients simultaneously. Some of penalty functions are used to produce sparse solutions. Based on the RMSE and Generalized Information Criterion (GIC) criteria, it was found that the factors affecting Indonesian mathematics scores, where LASSO produces 11 important variables for the model while SCAD has 6 variables which mean that the LASSO model is more complex than SCAD. The MCP produces a simpler model with 5 important variables but has excessive biassed. The results also showed that the SCAD penalty function had the best performance compared to LASSO, Ridge and MCP. Ridge penalty has a worst performance based on all criteria.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call