A multiclass boosting algorithm to labeled and unlabeled data

Jafar Tanha

doi:10.1007/s13042-019-00951-4

Abstract

In this article we focus on the semi-supervised learning. Semi-supervised learning typically is a learning task from both labeled and unlabeled data. We especially consider the multiclass semi-supervised classification problem. To solve the multiclass semi-supervised classification problem we propose a new multiclass loss function using new codewords. In the proposed loss function, we combine the classifier predictions, based on the labeled data, and the pairwise similarity between labeled and unlabeled examples. The main goal of the proposed loss function is to minimize the inconsistency between classifier predictions and the pairwise similarity. The proposed loss function consists of two terms. The first term is the multiclass margin cost of the labeled data and the second term is a regularization term on unlabeled data. The regularization term is used to minimize the cost of pseudo-margin on unlabeled data. We then derive a new multiclass boosting algorithm from the proposed risk function, called GMSB. The derived algorithm also uses a set optimal similarity functions for a given dataset. The results of our experiments on a number of UCI and real-world biological, text, and image datasets show that GMSB outperforms the state-of-the-art boosting methods to multiclass semi-supervised learning.

Full Text