A reliable ensemble based approach to semi-supervised learning

Sjoerd De Vries,Dirk Thierens

doi:10.1016/j.knosys.2021.106738

Sjoerd De Vries, Dirk Thierens

Open Access

https://doi.org/10.1016/j.knosys.2021.106738

Copy DOI

Abstract

Semi-supervised learning (SSL) methods attempt to achieve better classification of unseen data through the use of unlabeled data than can be achieved by learning from the available labeled data alone. Most SSL methods require the user to familiarize themselves with novel, complex concepts and to ensure the underlying assumptions made by these methods match the problem structure, or they risk a decrease in predictive performance. In this paper, we present the reliable semi-supervised ensemble learning (RESSEL) method, which exploits unlabeled data by using it to generate diverse classifiers through self-training and combines these classifiers into an ensemble for prediction. Our method functions as a wrapper around a supervised base classifier and refrains from introducing additional problem dependent assumptions. We conduct experiments on a number of commonly used data sets to prove its merit. The results show RESSEL improves significantly upon the supervised alternatives, provided that the base classifier which is used is able to produce adequate probability-based rankings. It is shown that RESSEL is reliable in that it delivers results comparable to supervised learning methods if this requirement is not met, while the method also broadens the range of good parameter values. Furthermore, RESSEL is demonstrated to outperform existing self-labeled wrapper approaches.

Full Text