Modeling the Paraphrase Detection Task over a Heterogeneous Graph Network with Data Augmentation

Rafael T Anchiêta,Rogério F De Sousa,Thiago A S Pardo

doi:10.3390/info11090422

Rafael T Anchiêta, Rogério F De Sousa + Show 1 more

Open Access

https://doi.org/10.3390/info11090422

Copy DOI

Abstract

Paraphrase detection is a Natural-Language Processing (NLP) task that aims at automatically identifying whether two sentences convey the same meaning (even with different words). For the Portuguese language, most of the works model this task as a machine-learning solution, extracting features and training a classifier. In this paper, following a different line, we explore a graph structure representation and model the paraphrase identification task over a heterogeneous network. We also adopt a back-translation strategy for data augmentation to balance the dataset we use. Our approach, although simple, outperforms the best results reported for the paraphrase detection task in Portuguese, showing that graph structures may capture better the semantic relatedness among sentences.

Highlights

Paraphrase detection is a Natural-Language Processing (NLP) task that aims to automatically identify whether two sentences convey the same meaning
Inverse Frequency (SIF) [20], and weighted aggregation based on Inverse Document Frequency (IDF)
We detailed the developed methods for paraphrase identification and our strategy to mitigate the unbalance of the ASSIN corpus

Summary

Introduction

Paraphrase detection is a Natural-Language Processing (NLP) task that aims to automatically identify whether two sentences convey the same meaning. The existing works that aim to detect paraphrase sentences in Portuguese [3,10], model this task as a machine-learning solution, building feature-value tables and training and testing classifiers. The authors apply sampling techniques to mitigate the unbalance issues of the ASSIN corpus, aiming to get more balanced data to improve the results of their models. Other strategies that make use of synthetic data suffer from criticism on the quality of the generated data To fulfill these gaps and explore other approaches for paraphrase detection, in this paper, inspired by Sousa et al [13], we model the paraphrase detection task as a heterogeneous network.

Related Work

The ASSIN Corpus

The MSRP Corpus

Balancing the ASSIN Corpus

Modeling the Paraphrase Identification Task

Formulating the Paraphrase Identification Task

Experiments and Results

Method

Final Remarks

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Sep 1, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Modeling the Paraphrase Detection Task over a Heterogeneous Graph Network with Data Augmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

Query-driven discovery of semantically similar substructures in heterogeneous networks
Xiao Yu ... Peixiang Zhao
-
Xiao Yu, et. al.Xiao Yu ... Peixiang Zhao
12 Aug 2012
12 Aug 2012

Predicting Semantic Similarity Between Clinical Sentence Pairs Using Transformer Models: Evaluation and Representational Analysis.
Mark Ormerod ... Jesús Martínez Del Rincón
JMIR medical informatics | VOL. 9
Mark Ormerod, et. al.Mark Ormerod ... Jesús Martínez Del Rincón
26 May 2021
JMIR medical informatics | VOL. 9

Incorporating Domain Knowledge Into Language Models by Using Graph Convolutional Networks for Assessing Semantic Textual Similarity: Model Development and Performance Comparison.
David Chang ... Richard Andrew Taylor
JMIR medical informatics | VOL. 9
David Chang, et. al.David Chang ... Richard Andrew Taylor
26 Nov 2021
JMIR medical informatics | VOL. 9

Research on Overlapping Community Detection Based on Density Peak Clustering in Heterogeneous Networks
Sun Yue ... Zhao Yuhong
-
Sun Yue, et. al.Sun Yue ... Zhao Yuhong
15 Apr 2022
15 Apr 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modeling the Paraphrase Detection Task over a Heterogeneous Graph Network with Data Augmentation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Information