Learning from Multi-class Positive and Unlabeled Data

Senlin Shu,Yan Yan,Li Li,Zhuoyi Lin

doi:10.1109/icdm50108.2020.00160

Abstract

Positive-unlabeled (PU) learning handles the problem of learning a predictive model from PU data. Past few years have witnessed the boom of PU learning, while the existing learning algorithms are limited to binary classification and cannot be directly applied to multi-class PU data. In this paper, we present an unbiased estimator of the original classification risk for multi-class PU learning, and show that the direct empirical risk minimization suffers from the severe overfitting problem because the risk is unbounded below. To address this problem, we propose an alternative risk estimator, and theoretically establish an estimation error bound. We show that the estimation error of its empirical risk minimizer achieves the optimal parametric convergence rate. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed approach for multiclass PU learning.

Full Text