Multi Rule-based and Corpus-based for Sundanese Stemmer

Ade Sutedi,Rickard Elsen,Muhammad Rikza Nasrulloh

doi:10.15575/join.v7i2.846

Abstract

The purpose of this study is to develop a stemming method by involved several methods including morphological (with affix and pro-lexeme removal), syllable (canonical) pattern, and corpus data as a comparison of the final results of stemming. The algorithm checks a number of the string first and removes affixes, then check the syllable pattern according to the stripping result, then compares to the corpus data which determines the final stemming process. In this study, the corpus data was taken from Sundanese dictionary consists of a single word used for the root word and the extracted dataset from the online Sundanese magazine. The results showed that the stripping of affix and pro-lexeme can remove the corresponding affixes and pro-lexeme then compares words that have a syllable pattern then executes the basic words quickly and the use of corpus can improve accuracy and reduce the over-stemming problems that occur in the stemming process.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Multi Rule-based and Corpus-based for Sundanese Stemmer

Abstract

Talk to us

Similar Papers

More From: Jurnal Online Informatika

Lead the way for us

Journal: Jurnal Online Informatika	Publication Date: Dec 29, 2022
License type: cc-by-nc-nd

Similar Papers

Sundanese Stemming using Syllable Pattern
Ade Sutedi ... Muhammad Rikza Nasrulloh
Jurnal Online Informatika | VOL. 6
Ade Sutedi, et. al.Ade Sutedi ... Muhammad Rikza Nasrulloh
26 Dec 2021
Jurnal Online Informatika | VOL. 6

MENYELISIK POLA KEKERABATAN BAHASA MELALUI CERITA RAKYAT BERJUDUL I KEDIS CANGAK (PEDANDA BAKA) DI BALI: ANALISIS LINGUISTIK HISTORIS KOMPARATIF
Muhammad Aditya Wisnu Wardana ... Slamet Mulyono
Jurnal Pendidikan dan Kebudayaan (JURDIKBUD) | VOL. 3
Muhammad Aditya Wisnu Wardana, et. al. Muhammad Aditya Wisnu Wardana ... Slamet Mulyono
26 Mar 2023
Jurnal Pendidikan dan Kebudayaan (JURDIKBUD) | VOL. 3

Distributional Stress Regularity: A Corpus Study
David Temperley
Journal of Psycholinguistic Research | VOL. 38
David TemperleyDavid Temperley
21 Oct 2008
Journal of Psycholinguistic Research | VOL. 38

An Automatic Algorithm for Locating the Beginning and End of an Utterance Using ADPCM Coded Speech
L H Rosenthal ... R W Schafer
The Journal of the Acoustical Society of America | VOL. 55
L H Rosenthal, et. al.L H Rosenthal ... R W Schafer
01 Feb 1974
The Journal of the Acoustical Society of America | VOL. 55

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Multi Rule-based and Corpus-based for Sundanese Stemmer

Abstract

Talk to us

Similar Papers

More From: Jurnal Online Informatika