Building a Vietnamese Dataset for Natural Language Inference Models.

Chinh Trong Nguyen,Dang Tuan Nguyen

doi:10.1007/s42979-022-01267-x

Abstract

Natural language inference models are essential resources for many natural language understanding applications. These models are possibly built by training or fine-tuning using deep neural network architectures for state-of-the-art results. That means high-quality annotated datasets are essential for building state-of-the-art models. Therefore, we propose a method to build a Vietnamese dataset for training Vietnamese inference models which work on native Vietnamese texts. Our approach aims at two issues: removing cue marks and ensuring the writing style of Vietnamese texts. If a dataset contains cue marks, the trained models will identify the relationship between a premise and a hypothesis without semantic computation. For evaluation, we fine-tuned a BERT model, viNLI, on our dataset and compared it to a BERT model, viXNLI, which was fine-tuned on XNLI dataset. The viNLI model has an accuracy of 94.79%, while the viXNLI model has an accuracy of 64.04% when testing on our Vietnamese test set. In addition, we also conducted an answer selection experiment with these two models in which the P@1 of viNLI and of viXNLI are 0.4949 and 0.4044, respectively. That means our method can be used to build a high-quality Vietnamese natural language inference dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: SN computer science	Publication Date: Jul 25, 2022
Citations: 1	License type: NO-CC CODE

R Discovery Prime

R Discovery Prime

Building a Vietnamese Dataset for Natural Language Inference Models.

Abstract

Talk to us

Similar Papers

More From: SN computer science

Lead the way for us

Similar Papers

Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models
Amit Gajbhiye ... Noura Al Moubayed
-
Amit Gajbhiye, et. al.Amit Gajbhiye ... Noura Al Moubayed
01 Jan 2020
01 Jan 2020

Knowledge-Enhanced Evidence Retrieval for Counterargument Generation
...
-
, et. al. ...
23 Oct 2021
23 Oct 2021

Knowledge-Enhanced Evidence Retrieval for Counterargument Generation
Yohan Jo ... Jinyeong Bak
-
Yohan Jo, et. al.Yohan Jo ... Jinyeong Bak
01 Jan 2020
01 Jan 2020

Research on judgment reasoning using natural language inference in Chinese medical texts
Xin Li ... Wenping Kong
-
Xin Li, et. al.Xin Li ... Wenping Kong
17 Dec 2021
17 Dec 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Building a Vietnamese Dataset for Natural Language Inference Models.

Abstract

Talk to us

Similar Papers

More From: SN computer science