Task-Optimized Word Embeddings for Text Classification Representations

Sukrat Gupta,Teja Kanchinadam,Devin Conathan,Glenn Fung

doi:10.3389/fams.2019.00067

Sukrat Gupta, Teja Kanchinadam + Show 2 more

Open Access

https://doi.org/10.3389/fams.2019.00067

Copy DOI

Journal: Frontiers in Applied Mathematics and Statistics	Publication Date: Jan 14, 2020
Citations: 9	License type: CC BY 4.0

Abstract

Word embeddings have introduced a compact and efficient way of representing text for further downstream natural language processing (NLP) tasks. Most word embedding algorithms are optimized at the word level. However, many NLP applications require text representations of groups of words, like sentences or paragraphs. In this paper, we propose a supervised algorithm that produces a task-optimized weighted average of word embeddings for a given task. Our proposed text embedding algorithm combines the compactness and expressiveness of the word-embedding representations with the word-level insights of a BoW-type model, where weights correspond to actual words. Numerical experiments across different domains show the competence of our algorithm.

Highlights

Word embeddings, or a learned mapping from a vocabulary to a vector space, are essential tools for state-of-the-art Natural Language Processing (NLP) techniques
Our algorithm provides better or comparable performance against unweighted averaged word embedding (UAEm) and WAEm
Our paper provides an alternative way of sentence/documentlevel representation for supervised text classification, based on optimization of the weights of words in the corresponding text to be classified

Summary

INTRODUCTION

A learned mapping from a vocabulary to a vector space, are essential tools for state-of-the-art Natural Language Processing (NLP) techniques. In this paper we propose a supervised algorithm that produces embeddings at the sentence-level that consist on an weighted average of an available pre-trained word-level embedding. On the other hand, identifies “romance” and “action” as two important words in the vocabulary for the supervised task, and assigns weights with high absolute value to these words This leads to shifting of the representation of the two reviews toward their respective important words in the vector space, increasing the distance between them. Our empirical results show that our proposed representation is in general competitive with traditional deep learning based text classification approaches and outperforms them when the training data is relatively small. Our resulting task specific text embedding are as compact as the original word level embedding while providing word level insights similar to a BOW type model.

RELATED WORK

OPTIMAL WORD EMBEDDINGS

Datasets

Word Embeddings

Text Processing

Results

Text Representation

CONCLUSIONS AND FUTURE WORK

DATA AVAILABILITY STATEMENT

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Task-Optimized Word Embeddings for Text Classification Representations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics

Lead the way for us

Similar Papers

Word Embeddings for Natural Language Processing

-

01 Jan 2015
01 Jan 2015

Learning class-specific word embeddings
Sicong Kuang ... Brian D Davison
The Journal of Supercomputing | VOL. 76
Sicong Kuang, et. al.Sicong Kuang ... Brian D Davison
23 Oct 2019
The Journal of Supercomputing | VOL. 76

GREEK-BERT: The Greeks visiting Sesame Street
John Koutsikakis ... Ilias Chalkidis
-
John Koutsikakis, et. al.John Koutsikakis ... Ilias Chalkidis
02 Sep 2020
02 Sep 2020

Word Embedding for Bengali Language using Domain-related Corpus
Ashutosh Bandyopadhyay ... Jayashree Nair
-
Ashutosh Bandyopadhyay, et. al.Ashutosh Bandyopadhyay ... Jayashree Nair
26 Apr 2023
26 Apr 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Task-Optimized Word Embeddings for Text Classification Representations

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Applied Mathematics and Statistics