Abstract

Word Sense Disambiguation (WSD) is a significant issue in Natural Language Processing (NLP). WSD refers to the capacity of recognizing the correct sense of a word in a given context. It can improve numerous NLP applications such as machine translation, text summarization, information retrieval, or sentiment analysis. This paper proposes an approach named ShotgunWSD. Shotgun WSD is an unsupervised and knowledgebased algorithm for global word sense disambiguation. The algorithm is motivated by the Shotgun sequencing technique. Shotgun WSD is proposed to disambiguate the word senses of Telugu document with three functional phases. The Shotgun WSD achieves the better performance than other approaches of WSD in the disambiguating sense of ambiguous words in Telugu documents. The dataset is used in the Indo-WordNet.

Highlights

  • The Telugu language is more complex with high morphological features compared to other languages and Word sense disambiguation by word co-occurrence improves the recall of the information retrieval system

  • We have used ShotgunWSD approach on Telugu data and this approach is the combination of the unsupervised ML approach and knowledge-based approach

  • To disambiguate Telugu words, the proposed Methodology consist of estimating the semantic relation between the context of the utilization of the ambiguous word and its sense definitions to extract the senses of the ambiguous word from Telugu corpora

Read more

Summary

INTRODUCTION

Natural language [1] [2] is full of ambiguity; numerous words can have various meanings in various contexts. Sense Disambiguation is the capacity of recognizing which sense of an ambiguous word is being used in a given context. ”. Using the context, the WSD system must decide which sense of the word “ “. WSD refers to the task of recognizing the sense of word in given context. It can be possibly improving numerous NLP applications, for example, machine translation, text summarization, information retrieval, or sentiment analysis. The Telugu language is more complex with high morphological features compared to other languages and Word sense disambiguation by word co-occurrence improves the recall of the information retrieval system. The use of Synset while applying sense count will improve the robustness of the system and Telugu data is collected from Indo-Wordnet

SURVEY OF LITERATURE
PROPOSED SYSTEM
AND DISCUSSION
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call