Abstract

Hierarchical multi-label text classification (HMTC) deals with the challenging task where an instance can be assigned to multiple hierarchically structured categories at the same time. The majority of prior studies either focus on reducing the HMTC task into a flat multi-label problem ignoring the vertical category correlations or exploiting the dependencies across different hierarchical levels without considering the horizontal correlations among categories at the same level, which inevitably leads to fundamental information loss. In this paper, we propose a novel HMTC framework that considers both vertical and horizontal category correlations. Specifically, we first design a loosely coupled graph convolutional neural network as the representation extractor to obtain representations for words, documents, and, more importantly, level-wise representations for categories, which are not considered in previous works. Then, the learned category representations are adopted to capture the vertical dependencies among levels of category hierarchy and model the horizontal correlations. Finally, based on the document embeddings and category embeddings, we design a hybrid algorithm to predict the categories of the entire hierarchical structure. Extensive experiments conducted on real-world HMTC datasets validate the effectiveness of the proposed framework with significant improvements over the baselines.

Highlights

  • As a fundamental problem in natural language processing (NLP), text classification is the task of assigning a given document to one or multiple categories according to its textual content

  • Many documents are tagged with multiple categories that can be organized in a tree or a Directed Acyclic Graph (DAG) (Wehrmann et al, 2018), which poses a more challenging task

  • We propose the Loosely Coupled Graph Convolutional Neural Network (LCGCN) as the representation component

Read more

Summary

Introduction

As a fundamental problem in natural language processing (NLP), text classification is the task of assigning a given document to one or multiple categories according to its textual content. Many documents are tagged with multiple categories that can be organized in a tree or a Directed Acyclic Graph (DAG) (Wehrmann et al, 2018), which poses a more challenging task. These categories can be organized into different levels of. The characteristics of categories are encoded in both the horizontal correlations among categories at the same level and the vertical dependencies between categories at different levels in the hierarchical organization. When we are confident that the document is tagged with hierarchical classification, C0 Artificial Intelligence

Computer Vision Natural Language Processing
Graph Convolutional Neural Networks
METHODOLOGY
Vertical Category Correlations
Loss Function
Datasets
Baselines and Parameter Settings
Ablation Study
Visualization of Horizontal Correlations
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call