Abstract

Current Sundanese stemmers either ignore reduplication words or define rules to handle only affixes. There is a significant amount of reduplication words in the Sundanese language. Because of that, it is impossible to achieve superior stemming precision in the Sundanese language without addressing reduplication words. This paper presents an improved stemmer for the Sundanese language, which handles affixed and reduplicated words. With a Sundanese root word list, we use a rules-based stemming technique. In our approach, all stems produced by the affixes removal or normalization processes are added to the stem list. Using a stem list can help increase stemmer accuracy by reducing stemming errors caused by affix removal sequence errors or morphological issues. The current Sundanese language stemmer, RBSS, was used as a comparison. Two datasets with 8218 unique affixed words and reduplication words were evaluated. The results show that our stemmer's strength and accuracy have improved noticeably. The use of stem list and word reduplication rules improved our stemmer's affixed type recognition and allowed us to achieve up to 99.30% accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.