Arabic Span Extraction-based Reading Comprehension Benchmark (ASER) and Neural Baseline Models

Mariam M Biltawi,Sara Tedmori,Arafat Awajan

doi:10.1145/3579047

Abstract

Machine reading comprehension (MRC) requires machines to read and answer questions about a given text. This can be achieved through either predicting answers or extracting them. Extracting answers from text involves predicting the first and last index of the answer span within the paragraph. Training machines to answer questions requires datasets that are created for such a purpose. The lack of availability of benchmarking datasets for the Arabic language has hindered research into machine reading comprehension from Arabic text. The aim of this article is to propose an Arabic Span-Extraction-based Reading Comprehension Benchmark (ASER) and complement it with neural baseline models for performance evaluations. Detailed steps are depicted for building and evaluating ASER, which is an Arabic dataset created manually for the task of machine reading comprehension. It contains 10,000 records from different domains and is divided into training and testing sets. The results of ASER evaluation led to the conclusion that it is a challenging benchmark since the answers have varying lengths and human performance resulted in an exact match of 42%. On the other hand, two main baseline models were the focus of ASER experimentation: the sequence-to-sequence (Seq2Seq) model with different neural networks and the bidirectional attention flow (BIDAF) model. These experiments were implemented using different embeddings, and the results showed an exact match with lower values than human performance.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Arabic Span Extraction-based Reading Comprehension Benchmark (ASER) and Neural Baseline Models

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing

Lead the way for us

Journal: ACM Transactions on Asian and Low-Resource Language Information Processing	Publication Date: May 8, 2023
Citations: 2

Similar Papers

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data
Dian Yu ... Dong Yu
-
Dian Yu, et. al.Dian Yu ... Dong Yu
01 Jan 2020
01 Jan 2020

ViMRC - VLSP 2021: Context-Aware Answer Extraction in Vietnamese Question Answering
Thi Thu Hang Le
VNU Journal of Science: Computer Science and Communication Engineering | VOL. 38
Thi Thu Hang LeThi Thu Hang Le
16 Dec 2022
ViMRC - VLSP 2021: Context-Aware Answer Extraction in Vietnamese Question Answering
Thi Thu Hang Le

Multi-task joint training model for machine reading comprehension
Fangfang Li ... Shichao Zhang
Neurocomputing | VOL. 488
Fangfang Li, et. al.Fangfang Li ... Shichao Zhang
01 Mar 2022
Neurocomputing | VOL. 488

Towards Reading Comprehension for Long Documents
Yuanxing Zhang ... Xiaoming Li
-
Yuanxing Zhang, et. al.Yuanxing Zhang ... Xiaoming Li
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Arabic Span Extraction-based Reading Comprehension Benchmark (ASER) and Neural Baseline Models

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Asian and Low-Resource Language Information Processing