The problem of multicollinearity among independent variables in a regression model was addressed in this study using Partial Least Squares (PLS) and Principal Component Analysis (PCA). The influence exerted by multicollinearity can deteriorate the results of regression modeling; that is, it makes the traditional technique OLS less reliable. The research is centered on applying advanced algorithms for Partial Least Squares (SIMPL and O-PLS) as well as for PCA (NIPALS and SVD) on real-life data scenarios, as well as integrating genetic algorithm (GAs) with these algorithms to optimize predictive performance. The relative efficiency of these methods is evaluated primarily through the amplitude of the Mean Square Error (MSE) used as a criterion for comparison. The results show the effectiveness of PLS-OPLS above PCA in terms of the lowest before and after embedding genetic algorithms into the MSE. All this underpins the effectiveness of PLS in minimizing multicollinearity thereby allowing for the formulation and prediction of very highly predictive models. More and more indication of subtle class was revealed with regard to fine tuning advanced GA technique in favor of enhancing regression modeling for complex data analysis. The ongoing research will help in opening numerous possibilities to reinforce regression methodology, especially when one factor in an array of applications representing a relatively significant level of prediction accuracy.
Read full abstract