A Lightweight Stemmer for Devanagari Script

Shruti R Dangui,Nitesh Naik

doi:10.1145/2835043.2835061

Abstract

Stemming is an operation that reduces morphological variants of words to its stem. Stemming is a pre-processing tool which is used in various natural language processing applications such as text summarization, information retrieval, word sense disambiguation, and document clustering. It improves the performance of Information Retrieval systems by increasing recall and reducing index size. The recall of the system is increased by stemming as the words present in the query are matched with their linguistic variants in the documents. It also reduces the index size which in turn leads to increase in speed and reduction in memory requirements. The different languages in Devanagari script are Hindi, Marathi, Konkani etc. The proposed idea is to develop a common stemmer for languages in Devanagari script by using supervised approach and to evaluate stemmer to measure the performance of stemmer.

Full Text