Combination of Genetic Algorithm and Brill Tagger Algorithm for Part of Speech Tagging Bahasa Madura

Nindian Puspa Dewi,Eka Rahayu Setyaningsih,Joan Santoso,Ubaidi Ubaidi

doi:10.11591/eecsi.v7.2034

Abstract

Part of speech (POS) is commonly known as word types in a sentence such as verbs, adjectives, nouns, and so on. Part of Speech (POS) Tagging is a process of marking the word class or part of speech in every word in a sentence. Part of Speech Tagging has an important role to be used as a basis for research in Natural Language Processing. That is why research on Part of Speech Tagging for Bahasa Madura as an effort to preserve and develop the use of regional languages. In this research, POS Tagging is done using the Brill Tagger Algorithm which is combined with the Genetic Algorithm. Brill Tagger is a POS Tagging Algorithm that has the best level of accuracy when implemented in other languages. Genetic Algorithms used in the contextual learner process with consideration in previous studies can increase the speed of the training process so that it is more efficient. The results of this study are then compared with the results of the previous study so that we can find out suitable algorithms used for the development of text processing in Bahasa Madura. From a series of experiments, the average accuracy obtained by using Brill Tagger is 86.4% with the highest accuracy of 86.7%, while using GA Brill Tagger shows an average accuracy of 86.5% with the highest accuracy of 86.6%. Testing by observing OOV (Out of Vocabulary) achieves an average accuracy of 67.7% for Brill Taggers and 64.6% for GA Brill Taggers. Testing by considering multiple POS with Brill Tagger produces an average accuracy of 73.3% while testing using GA Brill Tagger produces an average accuracy of 90.9%. This shows that the accuracy with GA Brill Tagger is better than Brill Tagger, especially if considering multiple POS. This is because GA Brill Tagger can generate rules for handling the existence of multiple POS more than pure Brill Tagger. Part of speech (POS) is commonly known as word types in a sentence such as verbs, adjectives, nouns, and so on. Part of Speech (POS) Tagging is a process of marking the word class or part of speech in every word in a sentence. Part of Speech Tagging has an important role to be used as a basis for research in Natural Language Processing. That is why research on Part of Speech Tagging for Bahasa Madura as an effort to preserve and develop the use of regional languages. In this research, POS Tagging is done using the Brill Tagger Algorithm which is combined with the Genetic Algorithm. Brill Tagger is a POS Tagging Algorithm that has the best level of accuracy when implemented in other languages. Genetic Algorithms used in the contextual learner process with consideration in previous studies can increase the speed of the training process so that it is more efficient. The results of this study are then compared with the results of the previous study so that we can find out suitable algorithms used for the development of text processing in Bahasa Madura. From a series of experiments, the average accuracy obtained by using Brill Tagger is 86.4% with the highest accuracy of 86.7%, while using GA Brill Tagger shows an average accuracy of 86.5% with the highest accuracy of 86.6%. Testing by observing OOV (Out of Vocabulary) achieves an average accuracy of 67.7% for Brill Taggers and 64.6% for GA Brill Taggers. Testing by considering multiple POS with Brill Tagger produces an average accuracy of 73.3% while testing using GA Brill Tagger produces an average accuracy of 90.9%. This shows that the accuracy with GA Brill Tagger is better than Brill Tagger, especially if considering multiple POS. This is because GA Brill Tagger can generate rules for handling the existence of multiple POS more than pure Brill Tagger

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Combination of Genetic Algorithm and Brill Tagger Algorithm for Part of Speech Tagging Bahasa Madura

Abstract

Talk to us

Similar Papers

More From: Proceeding of the Electrical Engineering Computer Science and Informatics

Lead the way for us

Similar Papers

Arabic Part Of Speech (POS) Tagging Analysis using Bee Colony Optimization (BCO) Algorithm on Quran Corpus
Arief Fatchul Huda ... Dian Rachmat Gumelar
-
Arief Fatchul Huda, et. al.Arief Fatchul Huda ... Dian Rachmat Gumelar
19 Aug 2021
19 Aug 2021

Part of speech tagging: a systematic review of deep learning and machine learning approaches
Alebachew Chiche ... Betselot Yitagesu
Journal of Big Data | VOL. 9
Alebachew Chiche, et. al.Alebachew Chiche ... Betselot Yitagesu
24 Jan 2022
Journal of Big Data | VOL. 9

Hidden Markov Model based Part of Speech Tagging for Nepali language
Abhijit Paul ... Bipul Syam Purkayastha
-
Abhijit Paul, et. al.Abhijit Paul ... Bipul Syam Purkayastha
01 Sep 2015
01 Sep 2015

A Ruled-Based Part of Speech (RPOS) Tagger for Malay Text Articles
Rayner Alfred ... Joe Henry Obit
-
Rayner Alfred, et. al.Rayner Alfred ... Joe Henry Obit
01 Jan 2013
01 Jan 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Combination of Genetic Algorithm and Brill Tagger Algorithm for Part of Speech Tagging Bahasa Madura

Abstract

Talk to us

Similar Papers

More From: Proceeding of the Electrical Engineering Computer Science and Informatics