Incorporating Prior Knowledge into Word Embedding for Chinese Word Similarity Measurement

Degen Huang,Jianjun Ma,Jiahuan Pei,Kaiyu Huang,Cong Zhang

doi:10.1145/3182622

Abstract

Word embedding-based methods have received increasing attention for their flexibility and effectiveness in many natural language-processing (NLP) tasks, including Word Similarity (WS). However, these approaches rely on high-quality corpus and neglect prior knowledge. Lexicon-based methods concentrate on human’s intelligence contained in semantic resources, e.g., Tongyici Cilin, HowNet, and Chinese WordNet, but they have the drawback of being unable to deal with unknown words. This article proposes a three-stage framework for measuring the Chinese word similarity by incorporating prior knowledge obtained from lexicons and statistics into word embedding: in the first stage, we utilize retrieval techniques to crawl the contexts of word pairs from web resources to extend context corpus. In the next stage, we investigate three types of single similarity measurements, including lexicon similarities, statistical similarities, and embedding-based similarities. Finally, we exploit simple combination strategies with math operations and the counter-fitting combination strategy using optimization method. To demonstrate our system’s efficiency, comparable experiments are conducted on the PKU-500 dataset. Our final results are 0.561/0.516 of Spearman/Pearson rank correlation coefficient, which outperform the state-of-the-art performance to the best of our knowledge. Experiment results on Chinese MC-30 and SemEval-2012 datasets show that our system also performs well on other Chinese datasets, which proves its transferability. Besides, our system is not language-specific and can be applied to other languages, e.g., English.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Incorporating Prior Knowledge into Word Embedding for Chinese Word Similarity Measurement

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: Apr 2, 2018
Citations: 5

Similar Papers

Combining Word Embedding and Semantic Lexicon for Chinese Word Similarity Computation
Jiahuan Pei ... Degen Huang
-
Jiahuan Pei, et. al.Jiahuan Pei ... Degen Huang
01 Jan 2015
01 Jan 2015

Combining Large-Scale Unlabeled Corpus and Lexicon for Chinese Polysemous Word Similarity Computation
Huiwei Zhou ... Yunlong Yang
-
Huiwei Zhou, et. al.Huiwei Zhou ... Yunlong Yang
01 Jan 2017
01 Jan 2017

Constructing and validating word similarity datasets by integrating methods from psychology, brain science and computational linguistics
Yu Wan ... Xiaodong Shi
Soft Computing | VOL. 22
Yu Wan, et. al.Yu Wan ... Xiaodong Shi
03 Apr 2018
Soft Computing | VOL. 22

A Hybrid Semantic Representation with Internal and External Knowledge for Word Similarity
Yanyan Wang ... Fulian Yin
-
Yanyan Wang, et. al.Yanyan Wang ... Fulian Yin
01 May 2020
01 May 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Incorporating Prior Knowledge into Word Embedding for Chinese Word Similarity Measurement

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing