Abstract
Part of speech (POS) is commonly known as word types in a sentence such as verbs, adjectives, nouns, and so on. Part of Speech (POS) Tagging is a process of marking the word class or part of speech in every word in a sentence. Part of Speech Tagging has an important role to be used as a basis for research in Natural Language Processing. That is why research on Part of Speech Tagging for Bahasa Madura as an effort to preserve and develop the use of regional languages. In this research, POS Tagging is done using the Brill Tagger Algorithm which is combined with the Genetic Algorithm. Brill Tagger is a POS Tagging Algorithm that has the best level of accuracy when implemented in other languages. Genetic Algorithms used in the contextual learner process with consideration in previous studies can increase the speed of the training process so that it is more efficient. The results of this study are then compared with the results of the previous study so that we can find out suitable algorithms used for the development of text processing in Bahasa Madura. From a series of experiments, the average accuracy obtained by using Brill Tagger is 86.4% with the highest accuracy of 86.7%, while using GA Brill Tagger shows an average accuracy of 86.5% with the highest accuracy of 86.6%. Testing by observing OOV (Out of Vocabulary) achieves an average accuracy of 67.7% for Brill Taggers and 64.6% for GA Brill Taggers. Testing by considering multiple POS with Brill Tagger produces an average accuracy of 73.3% while testing using GA Brill Tagger produces an average accuracy of 90.9%. This shows that the accuracy with GA Brill Tagger is better than Brill Tagger, especially if considering multiple POS. This is because GA Brill Tagger can generate rules for handling the existence of multiple POS more than pure Brill Tagger. Part of speech (POS) is commonly known as word types in a sentence such as verbs, adjectives, nouns, and so on. Part of Speech (POS) Tagging is a process of marking the word class or part of speech in every word in a sentence. Part of Speech Tagging has an important role to be used as a basis for research in Natural Language Processing. That is why research on Part of Speech Tagging for Bahasa Madura as an effort to preserve and develop the use of regional languages. In this research, POS Tagging is done using the Brill Tagger Algorithm which is combined with the Genetic Algorithm. Brill Tagger is a POS Tagging Algorithm that has the best level of accuracy when implemented in other languages. Genetic Algorithms used in the contextual learner process with consideration in previous studies can increase the speed of the training process so that it is more efficient. The results of this study are then compared with the results of the previous study so that we can find out suitable algorithms used for the development of text processing in Bahasa Madura. From a series of experiments, the average accuracy obtained by using Brill Tagger is 86.4% with the highest accuracy of 86.7%, while using GA Brill Tagger shows an average accuracy of 86.5% with the highest accuracy of 86.6%. Testing by observing OOV (Out of Vocabulary) achieves an average accuracy of 67.7% for Brill Taggers and 64.6% for GA Brill Taggers. Testing by considering multiple POS with Brill Tagger produces an average accuracy of 73.3% while testing using GA Brill Tagger produces an average accuracy of 90.9%. This shows that the accuracy with GA Brill Tagger is better than Brill Tagger, especially if considering multiple POS. This is because GA Brill Tagger can generate rules for handling the existence of multiple POS more than pure Brill Tagger
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Proceeding of the Electrical Engineering Computer Science and Informatics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.