Abstract

Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks.

Highlights

  • Unsupervised word representation learning, or word embedding, has shown remarkable effectiveness in various text analysis tasks, such as named entity recognition (Lample et al, 2016), text classification (Kim, 2014) and machine translation (Cho et al, 2014)

  • When we only look at the local context window of the word “harmful,” it is hard to predict if the center word should have positive or negative meaning

  • HSMN (Huang et al, 2012), Predictive Text Embedding (PTE) (Tang et al, 2015), and Doc2Cube (Tao et al, 2018) take the average of word embedding in the document as the document representation and encourage similarity between word embedding and document embedding for co-occurred words and documents

Read more

Summary

INTRODUCTION

Unsupervised word representation learning, or word embedding, has shown remarkable effectiveness in various text analysis tasks, such as named entity recognition (Lample et al, 2016), text classification (Kim, 2014) and machine translation (Cho et al, 2014). HSMN (Huang et al, 2012), PTE (Tang et al, 2015), and Doc2Cube (Tao et al, 2018) take the average of word embedding in the document as the document representation and encourage similarity between word embedding and document embedding for co-occurred words and documents These methods do not model global contexts explicitly because the document representations are essentially aggregated word representations and are not tailored for contextual representations. Both PTE and Doc2Cube require additional class information for text classification and are not unsupervised word embedding frameworks. We propose two unsupervised models that incorporate both local and global word contexts in word embedding learning, allowing them to provide complementary information for capturing word semantics. We perform case studies to understand the properties of our models

Local Context Word Embedding
Global Context Word Embedding
Joint Context Word Embedding
DEFINITIONS AND PRELIMINARIES
Joint CBOW Model
Joint Skip-Gram Model
INTERPRETING THE MODELS
The Spherical Generative Model
Joint CBOW as Words Generation
Joint Skip-Gram as Contexts Generation
Datasets
Baselines and Ablations
Implementation Details and Settings
Methods
Word Similarity Evaluation
Text Classification Evaluation
Parameter Study
Running Time Study
Case Studies
DISCUSSIONS
CONCLUSIONS AND FUTURE WORK
Findings
DATA AVAILABILITY STATEMENT
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call