Abstract

In this article, we propose a methodology that modifies the partial least squares (PLS) regression algorithm. Certain steps of the algorithm are adjusted to address the estimation problem in multiple linear regression when there are missing data (MD). The modified algorithm is called PLS1-MD and is based on the available data principle, allowing for multiple regression analysis even when there are missing values in the response or some of the explanatory variables, without the need for imputation. PLS1-MD can be applied under conditions of multicollinearity (where the explanatory variables are correlated, resulting in linear combinations among columns of the design matrix) and high dimensionality (where the number of individuals is less than the number of variables). The PLS1-MD algorithm ensures orthogonality, orthonormality of the coefficient vector, and optimality at each stage. The procedure is illustrated using the Cornell and Yarn datasets, which are widely known in the context of PLS1 regression. For this purpose, 10% of the data is randomly deleted and labeled as MD. The results indicate that the estimates obtained with the PLS1-MD algorithm are very similar to those generated when applying PLS1 to the set of observations with no MD. This new algorithm does not require imputing missing values, thus preserving the properties of centrality and orthogonality. We compare the results obtained using our approach with those obtained using the R libraries named pls and plsdepot. Under the scenario of no MD, we obtain the same results. In the presence of MD, the library pls cannot be used and only plsdepot solves the problem when there are MD in the explanatory variables.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.