Abstract

Now a day’s text documents are advancing over internet, e-mails and web pages. As the use of internet is exponentially growing, the need of massive data storage is increasing from time to time. Normally many of the documents contain morphological variables, so stemming which is a preprocessing technique gives a mapping of different morphological variants of words into their base word called the stem. Stemming process is used in information retrieval applications accordingly as a way to improve retrieval performance based on the assumption that terms with the same stem usually have similar meaning. To do stemming operation on bulky documents, we require normally more computation time and power, to cope up with the need to search for a particular word in the data. In this paper, various stemming algorithms are analyzed with the benefits and limitation of the recent stemming methods or approaches. Keywords : - Natural Language Processing Applications, Information Retrieval, Information Retrieval Applications (IRAs), Stemming Approaches DOI: 10.7176/IKM/10-3-01 Publication date: April 30 th 2020

Highlights

  • In all Information Retrieval applications, the main thing is to improve recalls and precisions

  • Stemming is a preprocessing footstep in text mining applications as well as a very common requirement of natural language processing functions

  • The capacity of the search database has increased in the last few years, so in order to meet the challenge of real time search natural language application algorithms speed up required

Read more

Summary

Introduction

In all Information Retrieval applications, the main thing is to improve recalls and precisions . The capacity of the search database has increased in the last few years, so in order to meet the challenge of real time search natural language application algorithms speed up required Those texts typically consist of many different syntactic variants for example connected, connect, connecting, connection, connectedly, connectedness, connectively, connectional, connective, connectable (adjective), connector (noun) all are derived word of root word “connect”(Tesfaye 2010)(Tesfaye n.d.). Successor Variety Approach According (Sousa and Castro n.d.) to successor variety is one of the stemming approaches in natural language processing applications including especially, in information retrieval processing systems In this approach, the successor variety of a string is the number of different characters that follow the string in words in a corpus (the www.iiste.org body of text). If we use B-tree or hash table lookup such would be fast, but there is a problem of storage overhead for such table(Bellovin and Rescorla 2005)

N-Gram Method
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call