VnNLI - VLSP 2021: Vietnamese and English-Vietnamese Textual Entailment Based on Pre-trained Multilingual Language Models

Hoàng Xuân Vũ,Khoa Thi-Kim Phan,Ngan Nguyen Luu Thuy,Nguyễn Văn Tài,Đặng Văn Thìn

doi:10.25073/2588-1086/vnucsce.329

Hoàng Xuân Vũ, Khoa Thi-Kim Phan + Show 3 more

Open Access

PDF Available

https://doi.org/10.25073/2588-1086/vnucsce.329

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Natural Language Inference (NLI) is a high-level semantic task in Natural Language Processing - NLP, and it extends further challenges if it is in the cross-lingual scenario. In recent years, pre-trained multilingual language models (e.g., mBERT ,XLM-R, InfoXLM) have greatly contributed to the success of dealing with these challenges. Based on the motivation behind these achievements, this paper describes our approach based on fine-tuning pretrained multilingual language models (XLM-R, InfoXLM) to tackle the shared task ``Vietnamese and English\-Vietnamese Textual Entailment'' at the 8th International Workshop on Vietnamese Language and Speech Processing (VLSP 2021\footnote{https://vlsp.org.vn/vlsp2021}). We investigate other techniques to improve the performance of our work: Cross-validation, Pseudo-labeling (PL), Learning rate adjustment, and Postagging. All experimental results demonstrated that our approach based on the InfoXLM model achieved competitive results, ranking 2nd for the task evaluation in VLSP 2021 with 0.89 in terms of F1-score on the private test set.

Full Text