Abstract

Paraphrase identification is a crucial task in natural language understanding, especially in cross-language information retrieval. Nowadays, Multi-Task Deep Neural Network (MT-DNN) has become a state-of-the-art method that brings outstanding results in paraphrase identification [1]. In this paper, our proposed method based on MT-DNN [2] to detect similarities between English and Vietnamese sentences, is proposed. We changed the shared layers of the original MT-DNN from original the BERT [3] to other pre-trained multi-language models such as M-BERT [3] or XLM-R [4] so that our model could work on cross-language (in our case, English and Vietnamese) information retrieval. We also added some tasks as improvements to gain better results. As a result, we gained 2.3% and 2.5% increase in evaluated accuracy and F1. The proposed method was also implemented on other language pairs such as English – German and English – French. With those implementations, we got a 1.0%/0.7% improvement for English – German and a 0.7%/0.5% increase for English – French.

Highlights

  • Paraphrase Identification (PI) is a task in Natural Language Processing (NLP) that concerns detecting a pair of text fragments that has the same meaning at different textual levels [1]

  • We replaced Bidirectional Encoder Representation Transformer (BERT) with Multilingual BERT (M-BERT) and XLM-R to be able to work on the English-Vietnamese language pair

  • The application of MT-Deep Neural Networks (DNNs) with transfer learning, combined with modified Multi-Task Learning (MTL) for the cross-language English - Vietnamese pair to achieve competitive performance in paraphrase identification task, was studied and presented

Read more

Summary

INTRODUCTION

Paraphrase Identification (PI) is a task in Natural Language Processing (NLP) that concerns detecting a pair of text fragments that has the same meaning at different textual levels [1]. Singapore is already the United States' 12th-largest trading partner, with two-way trade totaling more than $ 34 billion. Singapore đã là đối tác thương mại lớn thứ 12 của Hoa Kỳ, với tổng kim ngạch thương mại hai chiều hơn 34 tỷ USD. Mặc dù là một thành phố nhỏ, Singapore là đối tác thương mại lớn thứ 12 của Hoa Kỳ, với kim ngạch thương mại đạt 33,4 tỷ USD vào năm ngoái. The objective of this study is to improve the PI task between pairs of multilingual documents (namely English and Vietnamese) through applying transitional learning from a better pre-trained language model, with an MTL approach, including adding new improved tasks

Pre-Trained Model and Transfer Learning
Paraphrase Identification Methods
Schematic Overview
Updating the Pretrained Model
Adding More Tasks
Preparing the Dataset
Fine-Tuning
Configuration
AND DISCUSSION
RESULT
Findings
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call