Abstract

Machine reading comprehension (MRC) requires machines to read and answer questions about a given text. This can be achieved through either predicting answers or extracting them. Extracting answers from text involves predicting the first and last index of the answer span within the paragraph. Training machines to answer questions requires datasets that are created for such a purpose. The lack of availability of benchmarking datasets for the Arabic language has hindered research into machine reading comprehension from Arabic text. The aim of this article is to propose an Arabic Span-Extraction-based Reading Comprehension Benchmark (ASER) and complement it with neural baseline models for performance evaluations. Detailed steps are depicted for building and evaluating ASER, which is an Arabic dataset created manually for the task of machine reading comprehension. It contains 10,000 records from different domains and is divided into training and testing sets. The results of ASER evaluation led to the conclusion that it is a challenging benchmark since the answers have varying lengths and human performance resulted in an exact match of 42%. On the other hand, two main baseline models were the focus of ASER experimentation: the sequence-to-sequence (Seq2Seq) model with different neural networks and the bidirectional attention flow (BIDAF) model. These experiments were implemented using different embeddings, and the results showed an exact match with lower values than human performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call