Abstract

Hierarchical Text Classification (HTC) is a formidable task which involves classifying textual descriptions into a taxonomic hierarchy. Existing methods, however, have difficulty in adequately modeling the hierarchical label structures, because they tend to focus on employing graph embedding methods to encode the hierarchical structure while disregarding the fact that the HTC labels are rooted in a tree structure. This is significant because, unlike a graph, the tree structure inherently has a directive that ordains information flow from one node to another—a critical factor when applying graph embedding to the HTC task. But in the graph structure, message-passing is undirected, which will lead to the imbalance of message transmission between nodes when applied to HTC. To this end, we propose a unidirectional message-passing multi-label generation model for HTC, referred to as UMP-MG. Instead of viewing HTC as a classification problem as previous methods have done, this novel approach conceptualizes it as a sequence generation task, introducing prior hierarchical information during the decoding process. This further enables the blocking of information flow in one direction to ensure that the graph embedding method is better suited for the HTC task and thus resulted in the enhanced tree structure representation. Results obtained through experimentation on both the public WOS dataset and an E-commerce user intent classification dataset demonstrate that our proposed model can achieve superlative results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call