Sense disambiguation for Punjabi language using supervised machine learning techniques

Varinder Pal Singh,Parteek Kumar

doi:10.1007/s12046-019-1206-x

Abstract

Automatic identification of a meaning of a word in a context is termed as Word Sense Disambiguation (WSD). It is a vital and hard artificial intelligence problem used in several natural language processing applications like machine translation, question answering, information retrieval, etc. In this paper, an explicit WSD system for Punjabi language using supervised techniques has been analysed. The sense tagged corpus of 150 ambiguous Punjabi noun words has been manually prepared. The six supervised machine learning techniques Decision List, Decision Tree, Naive Bayes, K-Nearest Neighbour (K-NN), Random Forest and Support Vector Machines (SVM) have been investigated in this proposed work. Every classifier has used same feature space encompassing lexical (unigram, bigram, collocations, and co-occurrence) and syntactic (part of speech) count based features. The semantic features of Punjabi language have been devised from the unlabelled Punjabi Wikipedia text using word2vec continuous bag of word and skip gram shallow neural network models. Two deep learning neural network classifiers multilayer perceptron and long short term memory have also been applied for WSD of Punjabi words. The word embedding features have experimented on six classifiers for the Punjabi WSD task. It has been observed that the performance of the supervised classifiers applied for the WSD task of Punjabi language has been enhanced with the application of word embedding features. In this work, an accuracy of 84% has been achieved by LSTM classifier using word embedding feature.

Full Text