Co-STM text categorization method based on Supervised Topic Model

Genpeng Zhang,Xiaoyan Liu,Han Zheng

doi:10.1109/aemcse51986.2021.00101

Abstract

Traditional text classification methods are only trained on homogeneous data with uniform sample distribution and complete category labeling. With the rapid development of computer-related technology, today's data has the characteristics of diverse carrier forms, lack of label data, and unbalanced categories. As a result, traditional text classification technologies can no longer meet the requirements of today's text classification. Aiming at the new characteristics of data, this paper proposes a co-STM (collaborative text classification combined with Supervised Topic Model) collaborative text classification algorithm that integrates SLDA (Supervised LDA) supervised topic model. For category imbalance data, this paper adopts a confidence calculation method based on posterior probability distance and a category-based unlabeled sample selection strategy to select credible unlabeled samples. Comparative experiment results show that the co-STM text classification algorithm can effectively improve the performance of semi-supervised text classification.

Full Text