Abstract

Supervised word sense disambiguation (WSD) systems are usually the best performing systems when evaluated on standard benchmarks. However, these systems need annotated training data to function properly. While there are some publicly available open source WSD systems, very few large annotated datasets are available to the research community. The two main goals of this paper are to extract and annotate a large number of samples and release them for public use, and also to evaluate this dataset against some word sense disambiguation and induction tasks. We show that the open source IMS WSD system trained on our dataset achieves stateof-the-art results in standard disambiguation tasks and a recent word sense induction task, outperforming several task submissions and strong baselines.

Highlights

  • Identifying the meaning of a word automatically has been an interesting research topic for a few decades

  • The approaches used to solve this problem can be roughly categorized into two main classes: Word Sense Disambiguation (WSD) and Word Sense Induction (WSI) (Navigli, 2009)

  • Since the main purpose of this paper is to build and release a publicly available training set for word sense disambiguation systems, we selected the MultiUN corpus (MUN) (Eisele and Chen, 2010) produced in the EuroMatrixPlus project1

Read more

Summary

Introduction

Identifying the meaning of a word automatically has been an interesting research topic for a few decades. There are several sense-annotated datasets for WSD (Miller et al, 1993; Ng and Lee, 1996; Passonneau et al, 2012) These datasets either include few samples per word sense or only cover a small set of polysemous words. To overcome these limitations, automatic methods have been developed for annotating training samples. Diab (2004) proposed an unsupervised bootstrapping method to automatically generate a senseannotated dataset. Another example of automatically created datasets is the semi-supervised method used in (Kubler and Zhekova, 2009), which employed a supervised classifier to label instances

Objectives
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.