Abstract

In the multi-label text classification task, a text usually corresponds to multiple label categories, and the labels have correlation and hierarchical structure. However, when the label hierarchy is unknown, the number of various labels is not balanced, which makes it difficult for the model to classify low-frequency labels. In addition, labels have semantic similarities that make it difficult for the model to distinguish between them. In this article, we propose a multi-label text classification model based on multi-level constraint augmentation and label association attention. Compared with traditional methods, our method has two contributions: (1) In order to alleviate the problem of unbalanced number of different label categories and ensure the rationality of sample generation, we propose a data augmentation method based on multi-level constraints. In the process of sample generation, this method uses historical generation information, sample original text information, and sample topic to constrain the generated text. (2) In order to make the model recognize the associated labels accurately, we propose an interaction mechanism based on label association attention and filter gate. This method combines text information and label weight information. At the same time, our classification model considers the important weights of text sentences and effectively utilizes the co-occurrence relationship between labels. Experimental results on three benchmark datasets show that our model outperforms state-of-the-art methods on all main evaluation metrics, especially on low-frequency label prediction with sparse samples.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call