Abstract Word sense disambiguation (WSD) is the task of selecting correct sense for an ambiguous word in its context. Since WSD is one of the most challenging tasks in various text processing systems, improving its accuracy can be very beneficial. In this article, we propose a new unsupervised method based on co-occurrence graph created by monolingual corpus without any dependency on the structure and properties of the language itself. In the proposed method, the context of an ambiguous word is represented as a sub-graph extracted from a large word co-occurrence graph built based on a corpus. Most of the words are connected in this graph. To clarify the exact sense of an ambiguous word, its senses and relations are added to the context graph, and various similarity functions are employed based on the senses and context graph. In the disambiguation process, we select senses with highest similarity to the context graph. As opposite to other WSD methods, the proposed method does not use any language-dependent resources (e.g. WordNet) and it just uses a monolingual corpus. Therefore, the proposed method can be employed for other languages. Moreover, by increasing the size of corpus, it is possible to enhance the accuracy of WSD. Experimental results on English and Persian datasets show that the proposed method is competitive with existing supervised and unsupervised WSD approaches.
Read full abstract