Abstract

The rapid increase in the quantity of Kurdish documents over the last several years has created a need for improving information accuracy and precision in text classification and retrieval. Language stemming is an imperative pre-processing step for increasing the possibility of matching terms in a document in text classification tasks. Stemming helps reduce the total number of searchable terms within a document or query. This article proposes an active approach for stemming Kurdish Sorani texts to reduce variations of words to single terms or stems. The outcomes of the process, described in this article, demonstrate that decreasing the dimensionality of feature vectors in documents will increase the effectiveness of retrieval when the stemming process is used. This process applied for Kurdish Sorani can be adapted and applied in Kurdish Kurmanji as well for greater efficiency and effectiveness in digital text classification and applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call