Subsampling for partial least-squares regression via an influence function

Zhonghao Xie,Xi’An Feng,Xiaojing Chen

doi:10.1016/j.knosys.2022.108661

Abstract

Partial least squares (PLS) performs well for high-dimensional regression problems, where the number of predictors can far exceed the number of observations. Similar to many other supervised learning techniques, PLS was developed in the framework of empirical risk minimization, which typically assumes that the test and training data are drawn from the same distribution. Any violation of this assumption can deteriorate the PLS performance. Subsampling via an influence function is a recently developed and promising technique for addressing this problem. However, influence functions are only guaranteed to be accurate for sufficiently small changes to the model, limiting their application to small-scale datasets. To overcome this obstacle, a new form of the influence function for PLS is derived, and a framework of subsampling via an influence function for PLS is developed. Compared with the classic PLS and two other subsampling frameworks, the results on four simulation datasets and two real-world datasets illustrate the effectiveness of our method.

Full Text