Abstract

Madurese is one of the regional languages ​​in Indonesia. This is a cultural property that needs to be preserved. With various uniqueness and word formation rules, the Madurese language can be used in information retrieval, namely stemming. The Madurese language has a close relationship with the Javanese language; in several studies, the stemming method is often used, such as the modification of the Nazief and Adriani method, which has good performance for the Javanese language, but there has never been any research on the Madurese language and it has not been proven successful. Previous studies also have not used morphophonemic rules that influence word formation in Madurese. Therefore, this research was developed by modifying Nazief and Adriani's algorithm for Madurese based on Madurese language morphology by removing affixes, namely ter-ater (prefix), panoteng (suffix), and morphophonemic rules. Corpus uses 1000 words from the Madurese language dictionary that have received affixes. The accuracy of the algorithm is 89% with 890 words that match; the prefix has an accuracy of 93.81%; the suffix has an accuracy of 83.78%; and the confix has an accuracy of 80.07%. As for the overall performance, it produces an accuracy of 89.0% with an error rate of 11%. Understemming is found in 104 words, and overstemming in 6 words. The time it takes to compile is 31.31 seconds.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.