Semi-supervised partial least squares

Xi Jin,Xing Zhang,Liang Tang,Qiwei Xie,Kaifeng Rao

doi:10.1142/s0219691320500149

Abstract

Traditional supervised dimensionality reduction methods can establish a better model often under the premise of a large number of samples. However, in real-world applications where labeled data are scarce, traditional methods tend to perform poorly because of overfitting. In such cases, unlabeled samples could be useful in improving the performance. In this paper, we propose a semi-supervised dimensionality reduction method by using partial least squares (PLS) which we call semi-supervised partial least squares (S2PLS). To combine the labeled and unlabeled samples into a S2PLS model, we first apply the PLS algorithm to unsupervised dimensionality reduction. Then, the final S2PLS model is established by ensembling the supervised PLS model and the unsupervised PLS model which using the basic idea of principal model analysis (PMA) method. Compared with unsupervised or supervised dimensionality reduction algorithms, S2PLS not only can improve the prediction accuracy of the samples but also enhance the generalization ability of the model. Meanwhile, it can obtain better results even there are only a few or no labeled samples. Experimental results on five UCI data sets also confirmed the above properties of S2PLS algorithm.

Full Text