Abstract
Sequence labeling models for word sense disambiguation have proven highly effective when the sense vocabulary is compressed based on the thesaurus hierarchy. In this paper, we propose a method for compressing the sense vocabulary without using a thesaurus. For this, sense definitions in a dictionary are converted into sentence vectors and clustered into the compressed senses. First, the very large set of sense vectors is partitioned for less computational complexity, and then it is clustered hierarchically with awareness of homographs. The experiment was done on the English Senseval and Semeval datasets and the Korean Sejong sense annotated corpus. This process demonstrated that the performance greatly increased compared to that of the uncompressed sense model and is comparable to that of the thesaurus-based model.
Highlights
IntroductionWord sense disambiguation (WSD), i.e., finding the correct sense of a word in a given context, has long been a challenge in natural language understanding
What we have proposed in this paper is one of the practical solutions, and we leave for future work the task to find in complexity level more efficient clustering algorithms for sense compression
This paper proposes a clustering method to develop compressed sense vocabularies from sense definition vectors
Summary
Word sense disambiguation (WSD), i.e., finding the correct sense of a word in a given context, has long been a challenge in natural language understanding It has mostly been studied with machine learning models typically using the supervised, unsupervised, and knowledge-based approaches [1,2,3,4]. The knowledge-based approach [9,10] uses glossary information, usually from a dictionary or thesaurus, to match with the context of a target word. Kumar et al [17] extended this idea to include thesaurus information into the sense definition and merge them for continuous vector representation This approach requires a large amount of memory to process all the senses of homographs and their related glossaries.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.