NAIVE BAYES CLASSIFIER FOR WORD SENSE DISAMBIGUATION OF PUNJABI LANGUAGE

Varinder Pal Singh,Parteek Kumar

doi:10.22452/mjcs.vol31no3.2

Abstract

Word Sense Disambiguation (WSD) is the process of identifying the correct sense of the word in the context. The most leading scheme used by WSD is machine learning approach, where a human expert provides examples of correctly disambiguated words, and a machine learning algorithm is used to induce a model from these examples. In this paper, Naive Bayes supervised classifier has been used to disambiguate words of Punjabi language. The feature extraction process plays a vital role in building the supervised machine learning models. For the proposed Punjabi WSD system, Bag of Words (BoW) and collocation models are used separately to extract relevant features. BoW model has used all words around target word while collocation model has used two words before and two words after the target word as features. Both the models have used a common training data set to build the model. It has been observed that the selection of smoothing parameter for Naive Bayes has a significant impact on its performance. This proposed work has been tested on 150 most ambiguous noun words selected form Punjabi WordNet having 6 or more senses. During the process of building the model, fine senses of ambiguous words have been merged to produce coarse sense on the basis of manual analysis of lexical relations of WordNet. The accuracy of the proposed system has been calculated independently for BoW and collocation model. The proposed WSD system achieves an accuracy of 89% for BoW model and 81% for collocation model. It has been concluded that BoW model performs better than the collocation model for WSD task for Punjabi language.

Full Text