Development of Indonesian Stemming Algorithms through Modification of Grouping, Sequencing and Removing of Affixes Based on Morphophonemic

doi:10.35940/ijrte.b1044.0782s719

Abstract

Text documents stored on the system in an unstructured form, so that the information inside cannot be extracted directly. To be able to extract it, it takes text processing which is first carried out initial processing (preprocessing text) to convert text documents into more structured by selecting words that used as indexes. The smaller the index value, the more text documents are recognized on the system and the information is more easily extracted. The size of the index determined by the number of groups of words formed. To avoid forming many groups of words, then each word is changed to become a basic word first before grouping. The process of changing of affix word into a basic word using certain rules is called stemming. This research aims to produce a new Indonesian stemming algorithm named UG18 Stemmer algorithm, which can reduce or eliminate stemming errors such as over-stemming and under-stemming on existing stemming algorithms including the Enhanced Confix Stripping (ECS) Stemmer algorithm and the New Enhanced Confix Stripping (NECS) stemming algorithm. The method used is the morphophonemic process approach, which sees affixes as bound morphemes that experience phoneme changes, phoneme addition, and phoneme removal. The three processes are mapped, and Finite State Automata was made to obtain new affixed groups, sequences and new deletion methods that form the basis of the development of the UG18 Stemmer algorithm. This algorithm developed not using a list of decapitation rules used in pre-existing algorithms. Decapitation rules replaced with morphophonemic based elimination rules. Based on the evaluation results and testing of the UG18 Stemmer algorithm, it has a lower error rate compared to the results of stemming using NESC Stemmer. The result can be seen from the randomized test of 2500 word using Relevance Judgment validated by Indonesian language experts, from 1.48% over-stemming and 16.69% under-stemming using the NECS stemmer algorithm down to 0.12% overstemming and 0% understemming using the UG18 algorithm stemmer. Also, the UG18 Stemmer algorithm can improve the speed performance process in the information retrieval-based document similarity measurement application of 45.47% compared to using the ECS stemmer algorithm.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Development of Indonesian Stemming Algorithms through Modification of Grouping, Sequencing and Removing of Affixes Based on Morphophonemic

Abstract

Published Version

Talk to us

Similar Papers

More From: International Journal of Recent Technology and Engineering

Lead the way for us

Journal: International Journal of Recent Technology and Engineering	Publication Date: Sep 5, 2019
Citations: 1

Similar Papers

BENTUK PERUBAHAN FONOLOGIS DALAM BERKOMUNIKASI VIA WHATSAPP MAHASISWA UNIVERSITAS MUHAMMADIYAH MAKASSAR
Muliana Muliana ... Nur Rahmi
Neologia: Jurnal Bahasa dan Sastra Indonesia | VOL. 3
Muliana Muliana, et. al.Muliana Muliana ... Nur Rahmi
30 Aug 2022
Neologia: Jurnal Bahasa dan Sastra Indonesia | VOL. 3

STUDI GEJALA FONEMIS ANTARA BAHASA MELAYU RIAU DIALEK KAMPAR DAN BAHASA INDONESIA (Sebuah Pendekatan Historis)

DOAJ (DOAJ: Directory of Open Access Journals) | VOL. -

02 Nov 2012
DOAJ (DOAJ: Directory of Open Access Journals) | VOL. -

RAGAM BAHASA PADA TUTURAN PEDAGANG IKAN KABUPATEN DEMAK DITINJAU DARI KAJIAN FONOLOGI
Rifqiana Azizah ... Turahmat Turahmat
Jurnal Pendidikan Bahasa Indonesia | VOL. 5
Rifqiana Azizah, et. al.Rifqiana Azizah ... Turahmat Turahmat
17 Nov 2017
Jurnal Pendidikan Bahasa Indonesia | VOL. 5

Creating Alternatives through Design and Technology Innovation
Mohammed Ali Berawi
International Journal of Technology | VOL. 5
Mohammed Ali BerawiMohammed Ali Berawi
01 Jan 2014
International Journal of Technology | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Development of Indonesian Stemming Algorithms through Modification of Grouping, Sequencing and Removing of Affixes Based on Morphophonemic

Abstract

Published Version

Talk to us

Similar Papers

More From: International Journal of Recent Technology and Engineering