Abstract

Current Sundanese stemmers either ignore reduplication words or define rules to handle only affixes. There is a significant amount of reduplication words in the Sundanese language. Because of that, it is impossible to achieve superior stemming precision in the Sundanese language without addressing reduplication words. This paper presents an improved stemmer for the Sundanese language, which handles affixed and reduplicated words. With a Sundanese root word list, we use a rules-based stemming technique. In our approach, all stems produced by the affixes removal or normalization processes are added to the stem list. Using a stem list can help increase stemmer accuracy by reducing stemming errors caused by affix removal sequence errors or morphological issues. The current Sundanese language stemmer, RBSS, was used as a comparison. Two datasets with 8218 unique affixed words and reduplication words were evaluated. The results show that our stemmer's strength and accuracy have improved noticeably. The use of stem list and word reduplication rules improved our stemmer's affixed type recognition and allowed us to achieve up to 99.30% accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call