Multilabel Text Classification With Incomplete Labels: A Safe Generative Model With Label Manifold Regularization and Confidence Constraint

Yuanyuan Guan,Ximing Li

doi:10.1109/mmul.2020.3022068

Abstract

In multilabel text classification, the label information of training instances may be incomplete due to various factors, such as expensively manual labeling, errors, etc. Hence, imposing big challenges to training classifiers. To tackle the problem of the incomplete labels, we proposed to exploit label correlations, and further jointly leverage manifold regularization and label confidence constraint. On one hand, the label manifold regularization preserves the local label structure to handle the high-level incomplete labels. On the other hand, the label confidence constraint, following the likelihood theory, avoids overestimating the weights of negative labels, leading to a safer inference. Upon the ideas, we propose a novel generative model, namely B ayesian M odel with L abel manifold R egularization and label C onfidence constraint (BM-LRC), solved by generalized expectation maximization. Empirical results on popular benchmark data sets show that BM-LRC achieves competitive performance, especially when the available labels are scarce.

Full Text