Abstract
We propose a simple graph-based method for word sense disambiguation (WSD) where sense and context embeddings are constructed by applying the Skip-gram method to random walks over the sense graph. We used this method to build a WSD system for Swedish using the SALDO lexicon, and evaluated it on six different annotated test sets. In all cases, our system was several orders of magnitude faster than a state-of-the-art PageRank-based system, while outperforming a random baseline soundly.
Highlights
Word sense disambiguation (WSD) is a difficult task for automatic systems (Navigli, 2009)
We built a WSD system for Swedish by applying the random walk-based training described above to the SALDO lexicon (Borin et al, 2013). We evaluated this system on six different annotated corpora, in which the ambiguous words have been manually disambiguated according to SALDO, and compared it to random and firstsense baselines and UKB (Agirre and Soroa, 2009), a state-of-the-art graph-based WSD system
A model is trained on synthetic datasets compiled from random walks on SALDO
Summary
Word sense disambiguation (WSD) is a difficult task for automatic systems (Navigli, 2009). Several methods are available that use LKBs for WSD (Navigli and Lapata, 2007; Agirre and Soroa, 2009) These approaches usually apply a relatively complex analysis of the underlying graph based on the context of a target word to disambiguate it; e.g., Agirre and Soroa (2009) use the Personalized PageRank algorithm to perform walks on the graph. These methods are computationally very costly, which makes them practically useless for large corpora
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.