Text Classification Method Based On Semi-Supervised Transfer Learning

Xiaosheng Yu,Hehuan Zhang,Jing Li

doi:10.1109/qrs-c55045.2021.00064

Abstract

Semi-supervised transfer learning is an effective technique to improve the performance of few-shot learning. In order to solve the difficulty of obtaining a large amount of labeled data, this paper proposes a text classification method based on semi-supervised transfer learning (TC_SSTL). TC_SSTL makes full use of readily available unlabeled data to assist the model in learning text features through its underlying information. Firstly, TC_SSTL uses data augmentation on unlabeled data. Then augmented data, unlabeled data and labeled data are sent into the pre-training model together, using pseudo-label technology to conduct semi-supervised training on the unlabeled data and augmented data. At the same time, the pre-training model is fine-tuned using discriminative fine-tuning. On short text classification task, TC_SSTL can achieve the best performance using only 1000 labeled data.

Full Text