Embedding arabic questions by feature-level fusion of word representations for questions classification: It is worth doing?

Alami Hamza,Noureddine En-Nahnahi,Abdelkader El Mahdaouy,Said El Alaoui Ouatik

doi:10.1016/j.jksuci.2022.03.015

Abstract

Questions classification remains one of the most critical phases of Question Answering Systems. It aims to reduce the answers search space by assigning a predefined class label to each question. Recently, contextualized word representations methods based on deep learning approach have achieved state of the art performance in various fields of Natural Language Processing. However, few works have applied these representations to classify Arabic questions. In this research, we propose an Arabic question classification method based on a sentence transformers-based representation. Besides, we investigate the fusion of various word representations including Bidirectional Encoder Representations from Transformers (BERT), Embeddings from Language Models (ELMo), and word embeddings enriched by subwords information (W2V). Our contribution is threefold. First, our method handles out of vocabulary words. Second, we apply the BERT representation to extract the most valuable features from words and then construct better question representation. Third, we study the impact of fusing various word embeddings on Arabic question classification. To evaluate the proposed models, we perform stratified 5-folds cross-validation on a dataset containing 3173 questions labeled with Arabic and Li & Roth taxonomies. The experimental results show that all our models surpassed previous works related to Arabic question classification task. In the Arabic taxonomy case, we scored the maximum accuracy of 94.20% with our AraBERT-based model. As for Li & Roth taxonomy, the model based on the concatenation of AraBERT and W2V scored the highest overall accuracy of 93.51%.

Full Text