ANoM STEMMER: Nazief &amp; Andriani Modification for Madurese Stemming

Enni Lindrawati Enni Lindrawati,Aiinul Yaqin,Ema Utami Ema Utami

doi:10.29207/resti.v7i6.5086

ANoM STEMMER: Nazief & Andriani Modification for Madurese Stemming

Enni Lindrawati Enni Lindrawati, Aiinul Yaqin + Show 1 more

Open Access

https://doi.org/10.29207/resti.v7i6.5086

Copy DOI

Abstract

Madurese is one of the regional languages in Indonesia. This is a cultural property that needs to be preserved. With various uniqueness and word formation rules, the Madurese language can be used in information retrieval, namely stemming. The Madurese language has a close relationship with the Javanese language; in several studies, the stemming method is often used, such as the modification of the Nazief and Adriani method, which has good performance for the Javanese language, but there has never been any research on the Madurese language and it has not been proven successful. Previous studies also have not used morphophonemic rules that influence word formation in Madurese. Therefore, this research was developed by modifying Nazief and Adriani's algorithm for Madurese based on Madurese language morphology by removing affixes, namely ter-ater (prefix), panoteng (suffix), and morphophonemic rules. Corpus uses 1000 words from the Madurese language dictionary that have received affixes. The accuracy of the algorithm is 89% with 890 words that match; the prefix has an accuracy of 93.81%; the suffix has an accuracy of 83.78%; and the confix has an accuracy of 80.07%. As for the overall performance, it produces an accuracy of 89.0% with an error rate of 11%. Understemming is found in 104 words, and overstemming in 6 words. The time it takes to compile is 31.31 seconds.

Full Text