Arabic machine reading comprehension on the Holy Qur’an using CL-AraBERT

Rana Malhas,Tamer Elsayed

doi:10.1016/j.ipm.2022.103068

Rana Malhas, Tamer Elsayed

Open Access

https://doi.org/10.1016/j.ipm.2022.103068

Copy DOI

Export

Save

Cite

Journal: Information Processing & Management	Publication Date: Sep 9, 2022
Citations: 17	License type: cc-by-nc-nd

Affiliation: Qatar University

Abstract
Full-Text
Similar Papers

Abstract

Listen

In this work, we tackle the problem of machine reading comprehension (MRC) on the Holy Qur’an to address the lack of Arabic datasets and systems for this important task. We construct QRCD as the first Qur’anic Reading Comprehension Dataset, composed of 1,337 question-passage-answer triplets for 1,093 question-passage pairs, of which 14% are multi-answer questions. We then introduce CLassical-AraBERT (CL-AraBERT for short), a new AraBERT-based pre-trained model, which is further pre-trained on about 1.0B-word Classical Arabic (CA) dataset, to complement the Modern Standard Arabic (MSA) resources used in pre-training the initial model, and make it a better fit for the task. Finally, we leverage cross-lingual transfer learning from MSA to CA, and fine-tune CL-AraBERT as a reader using two MSA-based MRC datasets followed by our QRCD dataset to constitute the first (to the best of our knowledge) MRC system on the Holy Qur’an. To evaluate our system, we introduce Partial Average Precision (pAP) as an adapted version of the traditional rank-based Average Precision measure, which integrates partial matching in the evaluation over multi-answer and single-answer MSA questions. Adopting two experimental evaluation setups (hold-out and cross validation (CV)), we empirically show that the fine-tuned CL-AraBERT reader model significantly outperforms the baseline fine-tuned AraBERT reader model by 6.12 and 3.75 points in pAP scores, in the hold-out and CV setups, respectively. To promote further research on this task and other related tasks on Qur’an and Classical Arabic text, we make both the QRCD dataset and the pre-trained CL-AraBERT model publicly available.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Arabic machine reading comprehension on the Holy Qur’an using CL-AraBERT

Abstract

Published Version

Talk to us

Similar Papers

More From: Information Processing & Management

Lead the way for us

Similar Papers

Classifying and Segmenting Classical and Modern Standard Arabic using Minimum Cross-Entropy
Ibrahim S ... William J
International Journal of Advanced Computer Science and Applications | VOL. 8
Ibrahim S, et. al.Ibrahim S ... William J
01 Jan 2017
International Journal of Advanced Computer Science and Applications | VOL. 8

Patterns and Functions of Total Reduplication in Classical Arabic and Modern Standard Arabic:
Mohammed Modhaffer ... Sivaramakrishna Challavenkata
Linguistik Online | VOL. 94
Mohammed Modhaffer, et. al.Mohammed Modhaffer ... Sivaramakrishna Challavenkata
29 Mar 2019
Linguistik Online | VOL. 94

The distribution of nafs in modern Standard Arabic and Classical Arabic: a corpus-based study
Amani Mejri
Saudi Journal of Language Studies | VOL. 2
Amani MejriAmani Mejri
30 May 2022
Saudi Journal of Language Studies | VOL. 2

IADD: An integrated Arabic dialect identification dataset
Jihad Zahir
Data in Brief | VOL. 40
Jihad ZahirJihad Zahir
30 Dec 2021
Data in Brief | VOL. 40

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Arabic machine reading comprehension on the Holy Qur’an using CL-AraBERT

Abstract

Published Version

Talk to us

Similar Papers

More From: Information Processing &amp; Management

More From: Information Processing & Management