Design and Development of Marathi Word Stemmer

P Vaishali Kadam,B Kalpana Khandale,C Namrata Mahender

doi:10.1007/978-981-16-7389-4_4

Abstract

AbstractStemming is a basic morphological analysis tool or a process used to remove variants of a particular word and to identify its root or stem. In this, morphological analysis is carried out to identify various forms of a word to obtain its stem. Stemming is used as a preprocessing operational tool for the development of various natural language text applications, such as part-of-speech tagging, sentiment analysis, text segmentation, text classification, text summarization, information extraction, information retrieval applications, and named entity recognition. Stemming is generally used to achieve better performance and evaluation of information retrieval systems. It is achieved by increasing the standard in the format of the text. There may be various forms of the same root word or base that are used in the system database to increase storage requirements by stemming. Index files are reduced to the words present in the text with their linguistic variants in the documents. It is useful for memory management and system performance and for reducing memory requirements. Our proposed system is designed specifically for Marathi language text because for some other languages in Deonagri scripts like Hindi or Konkani stemmers are available and have good results and performance. But for Marathi, very little work has been done on stemmers. The proposed idea is to develop a stemmer for the Marathi language by using a supervised machine learning approach and our own developed handwritten grammar rule set and word dictionary using which the suffixes of the different variants are successfully removed for generating their respective stem words. Finally, the system is evaluated for its performance. It is measured at 61.34%.KeywordsNLPStemmingLinguistic variantsSuffix removalDeonagri scripts

Full Text