Automatic Construction of Fine-Grained Paraphrase Corpora System Using Language Inference Model

Ying Zhou,Xiaokang Hu,Vera Chung

doi:10.3390/app12010499

Abstract

Paraphrase detection and generation are important natural language processing (NLP) tasks. Yet the term paraphrase is broad enough to include many fine-grained relations. This leads to different tolerance levels of semantic divergence in the positive paraphrase class among publicly available paraphrase datasets. Such variation can affect the generalisability of paraphrase classification models. It may also impact the predictability of paraphrase generation models. This paper presents a new model which can use few corpora of fine-grained paraphrase relations to construct automatically using language inference models. The fine-grained sentence level paraphrase relations are defined based on word and phrase level counterparts. We demonstrate that the fine-grained labels from our proposed system can make it possible to generate paraphrases at desirable semantic level. The new labels could also contribute to general sentence embedding techniques.

Highlights

Paraphrase detection and generation are important natural language processing (NLP)tasks
After examining the size and sentence relations in various datasets, we focus the construction efforts on two language inference datasets: Multi-Genre Natural Language Inference (MNLI) and Stanford Natural Language Inference (SNLI) and three paraphrase datasets: Microsoft Research Paraphrase Corpus (MRPC) [1], Quora Question Pair (QQP) [12] and semantic textual similarity benchmark (STS-B) [13]
Since users are not expected to have seen all questions, the dataset is bound to have relatively high false negative samples. We find that both the MNLI and SNLI classifier tend to make a wrong prediction in sentences with an ambiguous pronoun reference

Summary

Introduction

Paraphrase detection and generation are important natural language processing (NLP). tasks. A dataset with a more strict rule may label the second sentence pair as a negative case Such variation can affect the generalisability of a paraphrase classification model. This paper proposes a novel method to automatically generate fine-grained paraphrase labels using language inference models. We developed a method utilising the language inference model to automatically assign fine-grained labels to sentence pairs in existing paraphrase and language inference corpora. We find that: Compared with Quora Question Pair (QQP), MRPC tolerates more semantic divergence in its positive class, which contains more directional paraphrases than equivalent ones. Language Inference (MNLI) contains more diversified sentence pairs in all three classes Such information may help researchers to design customised optimisation and to provide insights on observed performance variation

Related Work

Fine-Grained Paraphrase Relations

Observations from Language Inference Datasets

Auto Relabel Rules

Automatic Relabelling with Fine-Grained Paraphrase Relations

Three-Label Language Inference Classifiers and Initial Data Cleansing

Summary Statistics of Fine-Grained Labels

Fine-Grained Label Correctness and Accuracy Investigation

String Property Analysis

Generation Experiment

Experiment Models

Generator Results

Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Jan 5, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Automatic Construction of Fine-Grained Paraphrase Corpora System Using Language Inference Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Optimization of paraphrase generation and identification using language models in natural language processing
Hemant Palivela
International Journal of Information Management Data Insights | VOL. 1
Hemant PalivelaHemant Palivela
09 Jul 2021
International Journal of Information Management Data Insights | VOL. 1

Fusing Part-of-Speech Information in Low-Resource Neural Paraphrase Generation.
Xiaoqiang Chi ... Yang Xiang
Computational Intelligence and Neuroscience | VOL. 2021
Xiaoqiang Chi, et. al.Xiaoqiang Chi ... Yang Xiang
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

Augmenting Paraphrase Generation with Syntax Information Using Graph Convolutional Networks.
Xiaoqiang Chi ... Yang Xiang
Entropy | VOL. 23
Xiaoqiang Chi, et. al.Xiaoqiang Chi ... Yang Xiang
02 May 2021
Entropy | VOL. 23

Towards Document-Level Paraphrase Generation with Sentence Rewriting and Reordering
Zhe Lin ... Xiaojun Wan
-
Zhe Lin, et. al.Zhe Lin ... Xiaojun Wan
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Construction of Fine-Grained Paraphrase Corpora System Using Language Inference Model

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences