Abstract

Part of Speech (POS) Tagging is a process of labelling word classes on sentences. One of the POS Tagging problems is some words that spelt the same but have a different POS Tag depending on the context of the sentence (ambiguity). The approach to solving this problem is using the Hidden Markov Model (HMM) Ngram Algorithm and the Viterbi Algorithm. This study discusses the development of a system for Indonesian POS Tagging using the HMM N-gram algorithm (Bigram and Trigram) and the Viterbi algorithm and compares the result between the HMM Bigram and HMM trigram. An Indonesian language corpus that has been manually labeled called Indonesian Manually Tagged Corpus is used as the knowledge for the system. Then the corpus is processed using the HMM N-gram algorithm to get the rules. Furthermore, process the data with Viterbi algorithm using the previous formed rules to determine the POS tag with the highest probability. The highest accuracy results is 77.56% using the HMM Bigram - Viterbi Algorithm. While the HMM Trigram– Viterbi algorithm has the highest accuracy of 61.67%. The result shows that the system can solve the problem of tag ambiguity with HMM Ngram – Viterbi algorithm and the accuracy of HMM Bigram is better than the HMM Trigram.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.