Kurdish stemmer pre-processing steps for improving information retrieval

Arazo M Mustafa,Tarik A Rashid

doi:10.1177/0165551516683617

Abstract

The rapid increase in the quantity of Kurdish documents over the last several years has created a need for improving information accuracy and precision in text classification and retrieval. Language stemming is an imperative pre-processing step for increasing the possibility of matching terms in a document in text classification tasks. Stemming helps reduce the total number of searchable terms within a document or query. This article proposes an active approach for stemming Kurdish Sorani texts to reduce variations of words to single terms or stems. The outcomes of the process, described in this article, demonstrate that decreasing the dimensionality of feature vectors in documents will increase the effectiveness of retrieval when the stemming process is used. This process applied for Kurdish Sorani can be adapted and applied in Kurdish Kurmanji as well for greater efficiency and effectiveness in digital text classification and applications.

Full Text