Initializing neural networks for hierarchical multi-label text classification

Simon Baker,Anna Korhonen

doi:10.18653/v1/w17-2339

Abstract

Many tasks in the biomedical domain require the assignment of one or more predefined labels to input text, where the labels are a part of a hierarchical structure (such as a taxonomy). The conventional approach is to use a one-vs.-rest (OVR) classification setup, where a binary classifier is trained for each label in the taxonomy or ontology where all instances not belonging to the class are considered negative examples. The main drawbacks to this approach are that dependencies between classes are not leveraged in the training and classification process, and the additional computational cost of training parallel classifiers. In this paper, we apply a new method for hierarchical multi-label text classification that initializes a neural network model final hidden layer such that it leverages label co-occurrence relations such as hypernymy. This approach elegantly lends itself to hierarchical classification. We evaluated this approach using two hierarchical multi-label text classification tasks in the biomedical domain using both sentence- and document-level classification. Our evaluation shows promising results for this approach.

Highlights

Many tasks in biomedical natural language processing require the assignment of one or more labels to input text, where there exists some structure between the labels: for example, the assignment of Medical Subject Headings (MeSH) to PubMed abstracts (Lipscomb, 2000).A typical approach to classifying multi-label documents is to construct a binary classifier for each label in the taxonomy or ontology where all documents not belonging to the class are considered negative examples, i.e. one-vs.-rest (OVR) classification (Hong and Cho, 2008)
This approach can work with established neural network architectures such as a convolutional neural network (CNN) by initializing the final output layer to leverage the co-occurrences between the labels in the training data
There are many tasks in the biomedical domain that require the assignment of one or more labels to input text. These labels often exists within some hierarchical structure

Summary

Introduction

A typical approach to classifying multi-label documents is to construct a binary classifier for each label in the taxonomy or ontology where all documents not belonging to the class are considered negative examples, i.e. one-vs.-rest (OVR) classification (Hong and Cho, 2008). This approach has two major drawbacks: first, it makes the hard assumption that the classes are independent which often does not reflect reality; second, it is more computationally expensive (albeit by a constant factor): if there are a very large number of classes, the approach becomes computationally unrealistic. We investigate a simple and computationally fast approach for multi-label classification with a focus on labels that share a structure, such as a hierarchy (taxonomy). When multiple labels are assigned to the text, if it is explicitly labeled by a subclass it must implicitly include all of the its superclasses

Methods

Results

Discussion

Conclusion