Abstract

Traditional text classification methods are only trained on homogeneous data with uniform sample distribution and complete category labeling. With the rapid development of computer-related technology, today's data has the characteristics of diverse carrier forms, lack of label data, and unbalanced categories. As a result, traditional text classification technologies can no longer meet the requirements of today's text classification. Aiming at the new characteristics of data, this paper proposes a co-STM (collaborative text classification combined with Supervised Topic Model) collaborative text classification algorithm that integrates SLDA (Supervised LDA) supervised topic model. For category imbalance data, this paper adopts a confidence calculation method based on posterior probability distance and a category-based unlabeled sample selection strategy to select credible unlabeled samples. Comparative experiment results show that the co-STM text classification algorithm can effectively improve the performance of semi-supervised text classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call