Abstract
Despite that pre-trained word embedding models have advanced a wide range of natural language processing applications, they ignore the contextual information and meaning within the text. In this paper, we investigate the potential of the pre-trained Arabic BERT (Bidirectional Encoder Representations from Transformers) model to learn universal contextualized sentence representations aiming to showcase its usefulness for Arabic text Multi-class categorization. We propose to exploit the pre-trained AraBERT for contextual text representation learning in two different ways, transfer learning model and feature extractor. On the one hand, we employ the Arabic BERT (AraBERT) model after fine-tuning its parameters on the OSAC datasets to transfer its knowledge for the Arabic text categorization. On the other hand, we inquire into AraBERT performance, as a feature extractor model, by combining it with several classifiers, including CNN, LSTM, Bi-LSTM, MLP, and SVM. Finally, we conduct an exhaustive set of experiments comparing two BERT models, namely AraBERT and multilingual BERT. The findings show that the fine-tuned AraBERT model accomplishes state-of-the-art performance results and attains up to 99% in terms of F1-score and accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Journal of King Saud University - Computer and Information Sciences
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.