DAQAS: Deep Arabic Question Answering System based on duplicate question detection and machine reading comprehension

Hamza Alami,Abdelkader El Mahdaouy,Abdessamad Benlahbib,Noureddine En-Nahnahi,Ismail Berrada,Said El Alaoui Ouatik

doi:10.1016/j.jksuci.2023.101709

Abstract

As of late, various deep learning techniques and methods have shown their superiority to feature-based and shallow learning techniques in the field of open-domain question–answering systems (OpenQAS). However, only a few works adopted these techniques to build Arabic OpenQAS that can extract exact answers from large information sources (e.g., Wikipedia). In addition, no available Arabic OpenQAS integrated a module to identify duplicate questions to accelerate response time and reduce computation cost. In this paper, we propose an Arabic OpenQAS (named DAQAS) based on deep learning methods. It consists of three components: (1) Dense Duplicate Question Detection which returns answers to questions that already have been answered; (2) Retriever based on BM25 and Query Expansion by neural text generation; and (3) Reader able to extract exact answers given a question and the retrieved passages that probably contains the answer. All components of our system integrate deep learning models, specially transformers-based techniques, which have scored state-of-the-art in different NLP fields. We performed several experiments with publicly available question answering datasets to show the effectiveness of our system. DAQAS obtained promising results and scored 21.77% Exact Match and 54.71% F1 score when using only top 5 retrieved passages.

Full Text