Boosting Arabic Named Entity Recognition with K-Fold Cross Validation on LSTM and Bi-LSTM Models

Hamid Sadeq Mahdi Alsultani,Ahmed H Aliwy

doi:10.3844/jcssp.2022.792.800

Abstract

Named-Entity-Recognition(NER) is one of the most important Information-Extraction (IE) use cases, whichis used to improve the performance of Natural Languages Processing (NLP) tasks,such as Relation-Extraction (RE), Question-Answering (QA). Recently, Arabic NER is tackled in differentways by researchers. In this study, we assess the performance of two widelyused models, namely, LSTM and Bi-LSTM on the NER task in the Arabic languageand perform a comparative study between these models. In contrast to thetraditional data partition technique widely used during the training, we employthe technique of k-fold cross-validation to improve the performance of eachmodel. The experimental results reveal that the performance of all models isimproved when k-fold cross-validation is applied. Additionally, according toour experiment results, the Bi-LSTM model outperforms the LSTM model in termsof our evaluation metric. We achieve the best F1 score of 94.17% withCNN-Bi-LSTM-CRF. An ablation study on k-fold cross-validation demonstrates thatthe F1 score increased from 87.28 to 94.17%.

Full Text