Abstract

Part of Speech (POS) tagging is a part of Natural Language Processing to determine word labels correctly in a sentence from a given word input automatically. The development of POS tagging techniques for Arabic is still little done. This study discusses POS tagging using the Hidden Markov Model method on the text of the Qur'an. In this study the dataset consists of three categories of data originating from the corpus of the Qur'an consisting of 150 simple sentences, 50 sentences with more than one S/P/O/K (compound sentences) and 50 verses of the Qur'an. choice (Complex Sentence). Experimental data was carried out using the K-Fold Cross Validation technique. The dataset is divided into two, namely training data and test data. The training data is used to find emissions and transition probabilities, while the data testing uses the Viterbi algorithm to determine the best tag for each word. The experimental results obtained an average accuracy of 89.44% for the first category, namely simple sentence data, 74.18% for the second category, namely compound sentences, and 69.04% for the third category of complex sentence data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.