Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts.

Yu Meng,Guangyuan Wang,Chao Zhang,Jiaxin Huang,Jiawei Han,Zihan Wang

doi:10.3389/fdata.2020.00009

Yu Meng, Guangyuan Wang + Show 4 more

Open Access

https://doi.org/10.3389/fdata.2020.00009

Copy DOI

Abstract

Word embedding has benefited a broad spectrum of text analysis tasks by learning distributed word representations to encode word semantics. Word representations are typically learned by modeling local contexts of words, assuming that words sharing similar surrounding words are semantically close. We argue that local contexts can only partially define word semantics in the unsupervised word embedding learning. Global contexts, referring to the broader semantic units, such as the document or paragraph where the word appears, can capture different aspects of word semantics and complement local contexts. We propose two simple yet effective unsupervised word embedding models that jointly model both local and global contexts to learn word representations. We provide theoretical interpretations of the proposed models to demonstrate how local and global contexts are jointly modeled, assuming a generative relationship between words and contexts. We conduct a thorough evaluation on a wide range of benchmark datasets. Our quantitative analysis and case study show that despite their simplicity, our two proposed models achieve superior performance on word similarity and text classification tasks.

Highlights

Unsupervised word representation learning, or word embedding, has shown remarkable effectiveness in various text analysis tasks, such as named entity recognition (Lample et al, 2016), text classification (Kim, 2014) and machine translation (Cho et al, 2014)
When we only look at the local context window of the word “harmful,” it is hard to predict if the center word should have positive or negative meaning
HSMN (Huang et al, 2012), Predictive Text Embedding (PTE) (Tang et al, 2015), and Doc2Cube (Tao et al, 2018) take the average of word embedding in the document as the document representation and encourage similarity between word embedding and document embedding for co-occurred words and documents

Summary

INTRODUCTION

Unsupervised word representation learning, or word embedding, has shown remarkable effectiveness in various text analysis tasks, such as named entity recognition (Lample et al, 2016), text classification (Kim, 2014) and machine translation (Cho et al, 2014). HSMN (Huang et al, 2012), PTE (Tang et al, 2015), and Doc2Cube (Tao et al, 2018) take the average of word embedding in the document as the document representation and encourage similarity between word embedding and document embedding for co-occurred words and documents These methods do not model global contexts explicitly because the document representations are essentially aggregated word representations and are not tailored for contextual representations. Both PTE and Doc2Cube require additional class information for text classification and are not unsupervised word embedding frameworks. We propose two unsupervised models that incorporate both local and global word contexts in word embedding learning, allowing them to provide complementary information for capturing word semantics. We perform case studies to understand the properties of our models

Local Context Word Embedding

Global Context Word Embedding

Joint Context Word Embedding

DEFINITIONS AND PRELIMINARIES

Joint CBOW Model

Joint Skip-Gram Model

INTERPRETING THE MODELS

The Spherical Generative Model

Joint CBOW as Words Generation

Joint Skip-Gram as Contexts Generation

Datasets

Baselines and Ablations

Implementation Details and Settings

Methods

Word Similarity Evaluation

Text Classification Evaluation

Parameter Study

Running Time Study

Case Studies

DISCUSSIONS

CONCLUSIONS AND FUTURE WORK

Findings

DATA AVAILABILITY STATEMENT

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Big Data	Publication Date: Mar 11, 2020
Citations: 8	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Big Data

Lead the way for us

Similar Papers

The Role of Global and Local Contexts in Pronoun Comprehension
Bing Gao
Xin li xue bao. Acta psychologica Sinica | VOL. 40
Bing GaoBing Gao
19 Sep 2008
Xin li xue bao. Acta psychologica Sinica | VOL. 40

The Effects of Global and Local Stimulus Context on Auditory Frequency Discrimination
I Tsaliach, ... K Banai,
Journal of basic and clinical physiology and pharmacology | VOL. 21
I Tsaliach,, et. al.I Tsaliach, ... K Banai,
01 Jun 2010
Journal of basic and clinical physiology and pharmacology | VOL. 21

Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts
Guangxu Xun ... Yaliang Li
-
Guangxu Xun, et. al.Guangxu Xun ... Yaliang Li
04 Aug 2017
04 Aug 2017

Influential Global and Local Contexts Guided Trace Representation for Fault Localization
Zhuo Zhang ... Ting Su
ACM Transactions on Software Engineering and Methodology | VOL. 32
Zhuo Zhang, et. al.Zhuo Zhang ... Ting Su
26 Apr 2023
ACM Transactions on Software Engineering and Methodology | VOL. 32

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Unsupervised Word Embedding Learning by Incorporating Local and Global Contexts.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Big Data