Abstract
Paraphrase identification is a crucial task in natural language understanding, especially in cross-language information retrieval. Nowadays, Multi-Task Deep Neural Network (MT-DNN) has become a state-of-the-art method that brings outstanding results in paraphrase identification [1]. In this paper, our proposed method based on MT-DNN [2] to detect similarities between English and Vietnamese sentences, is proposed. We changed the shared layers of the original MT-DNN from original the BERT [3] to other pre-trained multi-language models such as M-BERT [3] or XLM-R [4] so that our model could work on cross-language (in our case, English and Vietnamese) information retrieval. We also added some tasks as improvements to gain better results. As a result, we gained 2.3% and 2.5% increase in evaluated accuracy and F1. The proposed method was also implemented on other language pairs such as English – German and English – French. With those implementations, we got a 1.0%/0.7% improvement for English – German and a 0.7%/0.5% increase for English – French.
Highlights
Paraphrase Identification (PI) is a task in Natural Language Processing (NLP) that concerns detecting a pair of text fragments that has the same meaning at different textual levels [1]
We replaced Bidirectional Encoder Representation Transformer (BERT) with Multilingual BERT (M-BERT) and XLM-R to be able to work on the English-Vietnamese language pair
The application of MT-Deep Neural Networks (DNNs) with transfer learning, combined with modified Multi-Task Learning (MTL) for the cross-language English - Vietnamese pair to achieve competitive performance in paraphrase identification task, was studied and presented
Summary
Paraphrase Identification (PI) is a task in Natural Language Processing (NLP) that concerns detecting a pair of text fragments that has the same meaning at different textual levels [1]. Singapore is already the United States' 12th-largest trading partner, with two-way trade totaling more than $ 34 billion. Singapore đã là đối tác thương mại lớn thứ 12 của Hoa Kỳ, với tổng kim ngạch thương mại hai chiều hơn 34 tỷ USD. Mặc dù là một thành phố nhỏ, Singapore là đối tác thương mại lớn thứ 12 của Hoa Kỳ, với kim ngạch thương mại đạt 33,4 tỷ USD vào năm ngoái. The objective of this study is to improve the PI task between pairs of multilingual documents (namely English and Vietnamese) through applying transitional learning from a better pre-trained language model, with an MTL approach, including adding new improved tasks
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have