Abstract

At present, the rapid development of computer technology has led to the emergence of massive short text data represented by e-commerce user comments. Due to the sparse feature and fast update speed of short text data, effective classification methods of short text have been widely concerned and applied. In this paper, a short text classification method integrating R-Drop and pre-trained language model IRDP-STC is proposed. The original data and generated data are input into the pre-trained language model Bert with Dropout, and KL divergence loss is added to approximate the two outputs. IRDP-STC makes full use of short text information, takes into account the robustness of the model on the basis of ensuring the high classification performance of the model, and is more suitable for low training sample scenarios. The experimental results on THUCNews dataset are better than the baseline model, which verifies the effectiveness and universality of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call