Evaluating the Effect of Preprocessing Tools for Marathi Text Retrieval

Harshali B. Patil,Ajay S. Patil

doi:10.1016/j.procs.2024.03.279

Abstract

The dramatic growth of the e-content available on the Internet in non-English languages facilitates the researchers to develop tools and techniques for automated processing of these languages. Retrieving meaningful information from this massive data is a challenging task, hence information retrieval of non-English languages is gaining more focus since last decade. The use of pre-processing tools like: stemmers, stop-word removal, lemmatizers, etc. has proven highly effective for the task of Information Retrieval for many languages like: English, Arabic, Hindi, etc. The goal of this work is to propose a simple stemmer for Marathi language using suffix stripping mechanism and evaluates the impact of it along with stop-word removal for Marathi text retrieval process. The result shows that significant improvement is obtained in the terms of precision, r-precision, precision@10, and recall due to the use of proposed suffix stripper and stop-word removal tool for Marathi text retrieval.

Full Text