Abstract

We propose a lightweight algorithm of learning word single-meaning embeddings (WSME), by exploring WordNet synsets and Doc2vec document embeddings, to enhance the accuracy of semantic relatedness measurement. In our model, each polyseme is decomposed into a series of monosemous words with diverse WordNet synset tags which represent different word meanings, and there is a one-to-one correspondence between a word meaning and a vector. Our algorithm proceeds in 3 steps. First, the word sense disambiguation of each polyseme in different contexts is achieved by computing the maximum relatedness between the context of this polyseme and all its candidate meaning definitions in WordNet. Second, each tagged word is lemmatized according to its synset tag to alleviate the word sparsity problem caused by polysemes decomposition. Third, the word single-meaning embeddings are learned from the meaning-tagged corpus, and the semantic relatedness between words can be more accurately measured based on such embeddings. Our experimental results show that our algorithm achieves better performance on the semantic relatedness measurement compared with existing techniques.

Highlights

  • Semantic relatedness is a metric to measure the relatedness of two concepts

  • Different decomposed monosemes with morphological variations but having the same meaning can be converted to a unified form, and the term frequency can be increased to fulfil the requirement of word embedding training. (iii) Third, we propose to compute the semantic relatedness between two words using the vectors of the lemmatized word meanings and the likelihood based on the document embeddings and the word embeddings

  • We present a lightweight algorithm of learning the word single-meaning embeddings, which is a kind of lexicon-based multi-prototype word embedding model based on the WordNet lexicon and the Doc2vec algorithm

Read more

Summary

INTRODUCTION

Semantic relatedness is a metric to measure the relatedness of two concepts. Computing relatedness between concepts has become a foundation for many natural language processing (NLP) applications such as text classification [1], information retrieval [2], entity linking [3], and question answering [4], etc. (i) First, we propose a novel WSD algorithm considering both the word order information and the word semantic information On this basis, we can adequately determine the specific meanings of a polyseme appearing in different contexts and learn higher-quality word-single meaning embeddings than other multi-prototype word embedding models. We prefix each polyseme with a WordNet synset tag to represent one of its meanings, and further lemmatize the meaning-tagged words based on the current synset tag In this way, different decomposed monosemes with morphological variations but having the same meaning can be converted to a unified form, and the term frequency can be increased to fulfil the requirement of word embedding training.

RELATED WORK
SEMANTIC RELATEDNESS MEASUREMENT STRATEGY FOR WSME
EXPERIMENTS AND RESULTS
3) EVALUATION METRIC
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call