Word-class embeddings for multiclass text classification

Alejandro Moreo,Andrea Esuli,Fabrizio Sebastiani

doi:10.1007/s10618-020-00735-3

Abstract

Pre-trained word embeddings encode general word semantics and lexical regularities of natural language, and have proven useful across many NLP tasks, including word sense disambiguation, machine translation, and sentiment analysis, to name a few. In supervised tasks such as multiclass text classification (the focus of this article) it seems appealing to enhance word representations with ad-hoc embeddings that encode task-specific information. We propose (supervised) word-class embeddings (WCEs), and show that, when concatenated to (unsupervised) pre-trained word embeddings, they substantially facilitate the training of deep-learning models in multiclass classification by topic. We show empirical evidence that WCEs yield a consistent improvement in multiclass classification accuracy, using six popular neural architectures and six widely used and publicly available datasets for multiclass text classification. One further advantage of this method is that it is conceptually simple and straightforward to implement. Our code that implements WCEs is publicly available at https://github.com/AlexMoreo/word-class-embeddings.

Highlights

Recent advances in deep learning have led to important improvements in many NLP tasks that deal with the semantic analysis of text, including word sense disambiguation, machine translation, summarization, question answering, and sentiment analysis
In this article we propose word-class embeddings (WCEs), a form of supervised embeddings of words designed for multiclass text classification, that directly model the interactions between terms and class labels
Since neural architectures use a random inizialization of the parameters, our results for them are averages across 10 runs. (SVMs are deterministic and excluded from the test of statistical significance, which requires the repetition of random trials.)

Summary

Introduction

Recent advances in deep learning have led to important improvements in many NLP tasks that deal with the semantic analysis of text, including word sense disambiguation, machine translation, summarization, question answering, and sentiment analysis (see [17, 43], for an overview). Word embeddings can either be initialized randomly and allowed to evolve along the rest of the model parameters, or be initialized from pre-trained word embeddings obtained offline by scanning massive amounts of textual data. This latter approach is generally preferred, since pre-trained embeddings encode an effective prior that embodies our general-purpose knowledge of the semantics of words, and that can be successfully transferred to (and eventually fine-tuned for) specific application contexts and downstream tasks [24]. Approaches to generate word embeddings typically rely on the distributional hypothesis, according to which words that tend to occur in similar contexts tend to have similar meanings [30]. Word2vec’s skip-gram with negative sampling method (SGNS – [52]) tries to guess the surrounding words from the observation of the central word in a sliding context window

Objectives

Methods

Results

Discussion

Conclusion