Exploring Contextual word representation for Arabic question classification

Alami Hamza,Said El Alaoui Ouatik,Noureddine En-Nahnahi

doi:10.1109/iraset48871.2020.9092084

Abstract

Identifying the category of a question is an essential task in Question Answering Systems. Contextual continuous word representation proved to be effective in various natural language processing tasks. However, these models have not been considered in the field of Arabic question classification. In this paper, we investigate the use of a contextual word representation named Embeddings from Language Models (ELMo) on Arabic question classification. We study the behaviour of this representation by building numerous neural network architectures trained to classify questions. The dataset contains 3173 questions annotated with two taxonomies including Arabic taxonomy and an updated Li & Roth taxonomy. By comparing to enriched word2vec with subword information technique, which is a context-free representation, we show that ELMo representation achieves better performance at the cost of reduced sized word vector. The best classifier achieves up to 94% in terms of accuracy, macro F 1 score, and weighted F 1 score.

Full Text