Abstract

Sequence labeling models for word sense disambiguation have proven highly effective when the sense vocabulary is compressed based on the thesaurus hierarchy. In this paper, we propose a method for compressing the sense vocabulary without using a thesaurus. For this, sense definitions in a dictionary are converted into sentence vectors and clustered into the compressed senses. First, the very large set of sense vectors is partitioned for less computational complexity, and then it is clustered hierarchically with awareness of homographs. The experiment was done on the English Senseval and Semeval datasets and the Korean Sejong sense annotated corpus. This process demonstrated that the performance greatly increased compared to that of the uncompressed sense model and is comparable to that of the thesaurus-based model.

Highlights

  • IntroductionWord sense disambiguation (WSD), i.e., finding the correct sense of a word in a given context, has long been a challenge in natural language understanding

  • What we have proposed in this paper is one of the practical solutions, and we leave for future work the task to find in complexity level more efficient clustering algorithms for sense compression

  • This paper proposes a clustering method to develop compressed sense vocabularies from sense definition vectors

Read more

Summary

Introduction

Word sense disambiguation (WSD), i.e., finding the correct sense of a word in a given context, has long been a challenge in natural language understanding It has mostly been studied with machine learning models typically using the supervised, unsupervised, and knowledge-based approaches [1,2,3,4]. The knowledge-based approach [9,10] uses glossary information, usually from a dictionary or thesaurus, to match with the context of a target word. Kumar et al [17] extended this idea to include thesaurus information into the sense definition and merge them for continuous vector representation This approach requires a large amount of memory to process all the senses of homographs and their related glossaries.

Sense Definition Clustering
Deep-Learning Model for Word Sense Disambiguation
Experiment Setting
Experiment Result
Findings
Discussion
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call