A Unified Model for Word Sense Representation and Disambiguation

Maosong Sun,Xinxiong Chen,Zhiyuan Liu

doi:10.3115/v1/d14-1110

Abstract

Most word representation methods assume that each word owns a single semantic vector. This is usually problematic because lexical ambiguity is ubiquitous, which is also the problem to be resolved by word sense disambiguation. In this paper, we present a unified model for joint word sense representation and disambiguation, which will assign distinct representations for each word sense. 1 The basic idea is that both word sense representation (WSR) and word sense disambiguation (WSD) will benefit from each other: (1) highquality WSR will capture rich information about words and senses, which should be helpful for WSD, and (2) high-quality WSD will provide reliable disambiguated corpora for learning better sense representations. Experimental results show that, our model improves the performance of contextual word similarity compared to existing WSR methods, outperforms stateof-the-art supervised methods on domainspecific WSD, and achieves competitive performance on coarse-grained all-words WSD.

Highlights

Word representation aims to build vectors for each word based on its context in a large corpus, usually capturing both semantic and syntactic information of words
We present a unified model for both word sense representation and disambiguation based on these knowledge bases and large-scale text corpora
The basic idea is that, the tasks of word sense representation (WSR) and word sense disambiguation (WSD) can benefit from each other: (1) high-quality WSR will capture rich semantic and syntactic information of words and senses, which should be helpful for WSD; (2) high-quality WSD will provide reliable disambiguated corpora for learning better sense representations

Summary

Introduction

Word representation aims to build vectors for each word based on its context in a large corpus, usually capturing both semantic and syntactic information of words. These representations can be used as features or inputs, which are widely employed in information retrieval (Manning et al, 2008), document classification (Sebastiani, 2002) and other NLP tasks. Most word representation methods assume each word owns a single vector. This is usually problematic due to the homonymy and polysemy of many words. To remedy the issue, Reisinger and Mooney (2010) proposed a multi-prototype vector space model, where the contexts of each word are first clustered into groups, and each cluster generates a distinct prototype vector for a word by averaging over all context vectors within the cluster. Huang et al (2012) followed this idea, but introduced continuous distributed vectors based on probabilistic neural language models for word representations

Methods

Results

Conclusion