A Hybrid of Rule-based and HMM-based Part-of-Speech Tagger for Indonesian

Muhammad Ridho Ananda,Ika Alfina,Muhammad Yudistira Hanifmuti

doi:10.1109/ialp54817.2021.9675180

Abstract

Aksara is an Indonesian NLP tool that conforms to Universal Dependencies annotation guidelines. So far, Aksara can perform four tasks: word segmentation, lemmatization, POS tagging, and morphological features analysis. However, one of its weaknesses is that it has not solved the word sense disambiguation problem. This work's objective is to build a hybrid of rule-based and Hidden Markov Model (HMM) based POS taggers that utilized the output of Aksara's rule-based POS tagger and solved the ambiguity problem using HMM and the Viterbi algorithm. We use the bigram and trigram model to train HMM. Our hybrid model is evaluated using a 10-fold cross-validation method and achieves an acceptable result with the trigram model slightly better. Trigram model managed to get 86.62% accuracy and an average F1-score of 82.32%, while the bigram model managed to get 86.47% accuracy and an average F1-score of 81.55%. The experiments also show that the hybrid model of rule-based and HMM-based is better than the HMM-based model alone, with a margin of 2.03% of accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Hybrid of Rule-based and HMM-based Part-of-Speech Tagger for Indonesian

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Indonesian Part of Speech Tagging Using Hidden Markov Model – Ngram & Viterbi
Denis Eka Cahyani ... Mtchael Juan Vindiyanto
-
Denis Eka Cahyani, et. al.Denis Eka Cahyani ... Mtchael Juan Vindiyanto
01 Nov 2019
01 Nov 2019

Combination of Trigram and Rule-based Model for Singlish to Sinhala Transliteration by Focusing Social Media Text
W.M.P Liwera ... L Ranathunga
-
W.M.P Liwera, et. al.W.M.P Liwera ... L Ranathunga
15 Dec 2020
15 Dec 2020

A Stochastic Part of Speech Tagger for the Sinhala Language based on Social Media Data mining
Shalki Ginthota Withanage ... Thushari Silva
-
Shalki Ginthota Withanage, et. al.Shalki Ginthota Withanage ... Thushari Silva
04 Nov 2020
04 Nov 2020

Comparative Analysis of Hidden Markov Model and Bidirectional Long Short-Term Memory for POS Tagging in Eastern Armenian
Varuzhan H Baghdasaryan
International Journal Of Scientific Advances | VOL. 4
Varuzhan H BaghdasaryanVaruzhan H Baghdasaryan
01 Jan 2023
International Journal Of Scientific Advances | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Hybrid of Rule-based and HMM-based Part-of-Speech Tagger for Indonesian

Abstract

Talk to us

Similar Papers