Arabic Part Of Speech (POS) Tagging Analysis using Bee Colony Optimization (BCO) Algorithm on Quran Corpus

Arief Fatchul Huda,Dian Rachmat Gumelar,Elis Ratnawulan,Fauziah Fauziah

doi:10.1109/icwt52862.2021.9678422

Abstract

Part Of Speech (POS) tagging is an automated process for determining the appropriate grammatical label or syntactic category of a word depending on the context. POS tagging is one of the important processes in Natural Language Processing (NLP) applications such as summarization text, Speech Recognition (SR), Question Answering (QA) and Information Retrieval (IR). Automatic POS tagging is needed because manual POS tagging takes a long time and is expensive because it requires a linguist. The main problem in POS tagging automatically is words that have different properties if placed in different contexts (ambiguous) and words that are in the test corpus but not in the Out of Vocabulary (OOV) training corpus. In this study, an efficient POS tagging approach for Arabic text will be discussed using the Bee Colony Optimization (BCO) algorithm. The POS tagging problem is represented as a graph and a new weighting technique is proposed to assign a transition value to each word class label which may not be probability, then the bees look for the best solution path. The dataset used in this study comes from the transliterated Quranic corpus consisting of 150 simple perfect sentences as an easy dataset category, 50 sentences with more than one S/P/O/K as a medium dataset category, and 50 selected Quran verses as a category. difficult datasets. The proposed approach is evaluated using a cross-validation technique, namely k-fold cross validation. The results showed an average accuracy of 100% for the easy dataset category, 98.96% for the medium dataset category, and 94.96% for the difficult dataset category.

Full Text