Abstract
Aksara is an Indonesian NLP tool that conforms to Universal Dependencies annotation guidelines. So far, Aksara can perform four tasks: word segmentation, lemmatization, POS tagging, and morphological features analysis. However, one of its weaknesses is that it has not solved the word sense disambiguation problem. This work's objective is to build a hybrid of rule-based and Hidden Markov Model (HMM) based POS taggers that utilized the output of Aksara's rule-based POS tagger and solved the ambiguity problem using HMM and the Viterbi algorithm. We use the bigram and trigram model to train HMM. Our hybrid model is evaluated using a 10-fold cross-validation method and achieves an acceptable result with the trigram model slightly better. Trigram model managed to get 86.62% accuracy and an average F1-score of 82.32%, while the bigram model managed to get 86.47% accuracy and an average F1-score of 81.55%. The experiments also show that the hybrid model of rule-based and HMM-based is better than the HMM-based model alone, with a margin of 2.03% of accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.