Abstract

Hierarchical text classification aims to assign text to multiple labels in a label set stored in a tree structure. The current algorithms mainly introduce the priori information of the label hierarchy, but the implicit correlation between labels in the hierarchy is rarely applied. At the same time, we also found that the inherent class imbalance of chain labels will also lead to poor classification effects of lower-level labels through a large number of studies. Therefore, a label structure enhanced hierarchy aware global model (LSE-HiAGM) is proposed. Firstly, the common density coefficient of labels is defined to measure the importance of a pair of labels in the hierarchical structure. Secondly, the common density coefficient is used as the weight of the label to update the topological structure features, so that the label can be linked with all labels globally. Finally, the topological structure feature, text features, and label hierarchical features are fused to make full use of all features to improve the embedding quality of low-level labels. In addition, to alleviate the class imbalance problem, a new loss function is used to constrain the model training. The probability of the label being sampled relative to all the labels of the sample is taken as the weight of the loss function. Therefore, a small penalty is imposed on the upper label and a large penalty on the lower label. A large number of experiments on datasets such as RCV1, WOS and NYT show that LSE-HiAGM performs better than the baseline models in hierarchical text classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call