Analysis Part of Speech Tagging Using Hidden Markov Model on Qur'an Data

Arief Fatchul Huda,Dedi Supriadi,Aep Saepuloh,Muhammad Hafidz Naufal Hilal

doi:10.1109/icwt52862.2021.9678472

Abstract

Part of Speech (POS) tagging is a part of Natural Language Processing to determine word labels correctly in a sentence from a given word input automatically. The development of POS tagging techniques for Arabic is still little done. This study discusses POS tagging using the Hidden Markov Model method on the text of the Qur'an. In this study the dataset consists of three categories of data originating from the corpus of the Qur'an consisting of 150 simple sentences, 50 sentences with more than one S/P/O/K (compound sentences) and 50 verses of the Qur'an. choice (Complex Sentence). Experimental data was carried out using the K-Fold Cross Validation technique. The dataset is divided into two, namely training data and test data. The training data is used to find emissions and transition probabilities, while the data testing uses the Viterbi algorithm to determine the best tag for each word. The experimental results obtained an average accuracy of 89.44% for the first category, namely simple sentence data, 74.18% for the second category, namely compound sentences, and 69.04% for the third category of complex sentence data.

Full Text