Illustration of merits of semi-supervised learning in regression analysis

Hiromasa Kaneko

doi:10.1016/j.chemolab.2018.08.015

Abstract

Semi-supervised learning (SSL) is a method for learning the relationship between X and y, and the essential structure of the corresponding dataset, using both labeled and unlabeled data. In this paper, an approach to use a combination of labeled and unlabeled samples to reduce the dimension, then perform regression analysis using the labeled samples in a low-dimensional space is focused in SSL methods. While various SSL methods for regression have been developed, there has been insufficient discussion as to why SSL is effective in regression analysis. Therefore, in this study, the merits of SSL in regression analysis are discussed in terms of the stability or the robustness and applicability domains of regression models and prior distribution of X-variables. The superiorities of SSL methods over fully supervised methods in regression are demonstrated using data from numerical simulations, quantitative structure–activity relationships and quantitative structure–property relationships.

Full Text