ViReader: A Wikipedia-based Vietnamese reading comprehension system using transfer learning

Kiet Van Nguyen,Anh Gia-Tuan Nguyen,Phong Nguyen-Thuan Do,Nhat Duy Nguyen,Ngan Luu-Thuy Nguyen

doi:10.3233/jifs-210683

Abstract

Machine Reading Comprehension has attracted significant interest in research on natural language understanding, and large-scale datasets and neural network-based methods have been developed for this task. However, most developments of resources and methods in machine reading comprehension have been investigated using two resource-rich languages, English and Chinese. This article proposes a system called ViReader for open-domain machine reading comprehension in Vietnamese by using Wikipedia as the textual knowledge source, where the answer to any particular question is a textual span derived directly from texts on Vietnamese Wikipedia. Our system combines a sentence retriever component, based on techniques of information retrieval to extract the relevant sentences, with a transfer learning-based answer extractor trained to predict answers based on Wikipedia texts. Experiments on multiple datasets for machine reading comprehension in Vietnamese and other languages demonstrate that (1) our ViReader system is highly competitive with prevalent machine learning-based systems, and (2) multi-task learning by using a combination consisting of the sentence retriever and answer extractor is an end-to-end reading comprehension system. The sentence retriever component of our proposed system retrieves the sentences that are most likely to provide the answer response to the given question. The transfer learning-based answer extractor then reads the document from which the sentences have been retrieved, predicts the answer, and returns it to the user. The ViReader system achieves new state-of-the-art performances, with values of 70.83% EM (exact match) and 89.54% F1, outperforming the BERT-based system by 11.55% and 9.54% , respectively. It also obtains state-of-the-art performance on UIT-ViNewsQA (another Vietnamese dataset consisting of online health-domain news) and BiPaR (a bilingual dataset on English and Chinese novel texts). Compared with the BERT-based system, our system achieves significant improvements (in terms of F1) with 7.65% for English and 6.13% for Chinese on the BiPaR dataset. Furthermore, we build a ViReader application programming interface that programmers can employ in Artificial Intelligence applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ViReader: A Wikipedia-based Vietnamese reading comprehension system using transfer learning

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent & Fuzzy Systems

Lead the way for us

Journal: Journal of Intelligent & Fuzzy Systems	Publication Date: Aug 11, 2021
Citations: 5

Similar Papers

Enhance Machine Reading Comprehension on Multiple Sentence Questions with Gated and Dense Coreference Information
Nattachai Tretasayuth ... Peerapon Vateekul
-
Nattachai Tretasayuth, et. al.Nattachai Tretasayuth ... Peerapon Vateekul
01 Jul 2018
01 Jul 2018

A Survey on Machine Reading Comprehension—Tasks, Evaluation Metrics and Benchmark Datasets
Changchang Zeng ... Jie Hu
Applied Sciences | VOL. 10
Changchang Zeng, et. al.Changchang Zeng ... Jie Hu
29 Oct 2020
Applied Sciences | VOL. 10

BiPaR: A Bilingual Parallel Dataset for Multilingual and Cross-lingual Reading Comprehension on Novels
Yimin Jing ... Zhen Yan
-
Yimin Jing, et. al.Yimin Jing ... Zhen Yan
01 Jan 2019
01 Jan 2019

Self-Teaching Machines to Read and Comprehend with Large-Scale Multi-Subject Question-Answering Data
Dian Yu ... Claire Cardie
-
Dian Yu, et. al.Dian Yu ... Claire Cardie
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ViReader: A Wikipedia-based Vietnamese reading comprehension system using transfer learning

Abstract

Talk to us

Similar Papers

More From: Journal of Intelligent &amp; Fuzzy Systems

More From: Journal of Intelligent & Fuzzy Systems