Efficient estimation of Hindi WSD with distributed word representation in vector space

Archana Kumari,D.K Lobiyal

doi:10.1016/j.jksuci.2021.03.008

Abstract

Word Sense Disambiguation (WSD) is significant for improving the accuracy of the interpretation of a Natural language text. Various supervised learning-based models and knowledge-based models have been developed in the literature for WSD of the language text. However, these models do not provide good results for low-resource languages, due to the lack of labelled and tagged data. Therefore, in this work, we have examined different word embedding techniques for word sense disambiguation of the Hindi language texts. Several studies in the literature show that these embeddings have been utilized for different foreign languages in the field of word sense disambiguation. However, to the best of our knowledge, no such work exists for the Hindi language. Therefore, in this paper, we utilize various existing word embeddings for WSD of Hindi text. Moreover, we have created Hindi word embeddings on articles taken from Wikipedia and test the quality of the created word embeddings using Pearson correlation. In this direction, we perform different experiments and observe that Word2Vec model gives best performance among all the considered embeddings on the used Hindi dataset. In our method, the proposed model directly takes input that is trained with word embedding methods and helps to develop a sense inventory using clustering that has been employed for performing disambiguation. Experimental observations indicate that the performance of the proposed approach is moderate and competent in terms of accuracy. The paper, thus, presents how WSD can leverage these representations to encode rich semantic information.

Full Text