Abstract
This study aims at performance evaluation of Ridge, Elastic Net and Lasso Regression Methods in handling different degrees of multicollinearity in a multiple regression analysis of independent variables using simulation data. The researcher simulated a collection of data with sample size n=200, 1000, 10000, 50000 and 100000, independent variables p=10. The researcher compared the performances of the three methods using Mean Square Errors (MSE). The study found that Elastic Net method outperforms Ridge and Lasso methods to estimate the regression coefficients when a degree of multicollinearity is low, moderate and high for any sample size. While, Lasso method is the most accurate regression coefficients estimator when data containing severe multicollinearity at sample size less than 10000 observations.
Highlights
Multiple linear regression is frequently employed is appropriate in particular context to evaluate a model to predict the expected responses, or to explore the link between the dependent variable and the independent variables
There is a high number of hypotheses about the model in the regression analysis, specially, the most important one is, in addition to
Where Y n 1 is the dependent vector variable, X n p symbolizes the independent variables, p 1 is the set of regression coefficients that needs to be estimated, and p 1 symbolizes the residuals
Summary
Multiple linear regression is frequently employed is appropriate in particular context to evaluate a model to predict the expected responses, or to explore the link between the dependent variable and the independent variables. The first goal, which is the design's prediction accuracy, is critical; the second goal, which is the model's complexity, is more important. Common linear regression procedures are popular for generally not carrying out well according to both prediction performance and model involvement (Doreswamy and Vastrad, 2013). There is a high number of hypotheses about the model in the regression analysis, specially, the most important one is (multicollinearity), in addition to (non-homogeneity of variance, autocorrelation and linearity). If one or more assumptions are broken, the model becomes unreliable, Shady I. Altelbany and it is no longer suitable for estimating population parameters (Herawati et al, 2018)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.