Abstract

Word sense disambiguation (WSD) is considered to be a difficult problem in computational linguistics. Single-approach solutions to this problem consisting of only one module are unlikely to yield high performance levels, while hybrid systems formed by combining a number of modules tend to offer a better performance in tackling WSD problems. We propose the strong POS + Frequency baseline as a basic easy-to-implement platform for testing how well algorithms can do when combined with other high-accuracy modules. After giving an overview of the field, we discuss our proposed novel model for WSD, which is a stand-alone contribution in a field in which ideas are repeated. Under the umbrella of our novel WSD model, called Sense Space Model (SSM), we show that significant and interesting algorithms exist. While the accuracy of some of the unsupervised offspring algorithms of the model can be low compared to the strong POS + Frequency baseline (and also compared to the top hybrid systems), sometimes even having an accuracy lower than a random system, such algorithms can still act significantly better than a random system when combined to the strong baseline, considering a meticulous 1% significance level. Therefore, ruling out such lower-accuracy modules from a hybrid system, which might otherwise appear to be a necessary elimination, is challenged. One of these significant algorithms was recently improved by introducing “a threshold” and could beat the implemented POS + Frequency baseline. This confirms that considering such lower-accuracy algorithms as significant is reasonable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call