Impact of Using Bidirectional Encoder Representations from Transformers (BERT) Models for Arabic Dialogue Acts Identification

Alaa Joukhadar,Ghaida Rebdawi,Nada Ghneim

doi:10.18280/isi.260506

Abstract

In Human-Computer dialogue systems, the correct identification of the intent underlying a speaker's utterance is crucial to the success of a dialogue. Several researches have studied the Dialogue Act Classification (DAC) task to identify Dialogue Acts (DA) for different languages. Recently, the emergence of Bidirectional Encoder Representations from Transformers (BERT) models, enabled establishing state-of-the-art results for a variety of natural language processing tasks in different languages. Very few researches have been done in the Arabic Dialogue acts identification task. The BERT representation model has not been studied yet in Arabic Dialogue acts detection task. In this paper, we propose a model using BERT language representation to identify Arabic Dialogue Acts. We explore the impact of using different BERT models: AraBERT Original (v0.1, v1), AraBERT Base (v0.2, and v2) and AraBERT Large (v0.2, and v2), which are pretrained on different Arabic corpora (different in size, morphological segmentation, language model window, …). The comparison was performed on two available Arabic datasets. Using AraBERTv0.2-base model for dialogue representations outperformed all other pretrained models. Moreover, we compared the performance of AraBERTv0.2-base model to the state-of-the-art approaches applied on the two datasets. The comparison showed that this representation model outperformed the performance both state-of-the-art models.

Full Text