Abstract
Paraphrase detection is a Natural-Language Processing (NLP) task that aims at automatically identifying whether two sentences convey the same meaning (even with different words). For the Portuguese language, most of the works model this task as a machine-learning solution, extracting features and training a classifier. In this paper, following a different line, we explore a graph structure representation and model the paraphrase identification task over a heterogeneous network. We also adopt a back-translation strategy for data augmentation to balance the dataset we use. Our approach, although simple, outperforms the best results reported for the paraphrase detection task in Portuguese, showing that graph structures may capture better the semantic relatedness among sentences.
Highlights
Paraphrase detection is a Natural-Language Processing (NLP) task that aims to automatically identify whether two sentences convey the same meaning
Inverse Frequency (SIF) [20], and weighted aggregation based on Inverse Document Frequency (IDF)
We detailed the developed methods for paraphrase identification and our strategy to mitigate the unbalance of the ASSIN corpus
Summary
Paraphrase detection is a Natural-Language Processing (NLP) task that aims to automatically identify whether two sentences convey the same meaning. The existing works that aim to detect paraphrase sentences in Portuguese [3,10], model this task as a machine-learning solution, building feature-value tables and training and testing classifiers. The authors apply sampling techniques to mitigate the unbalance issues of the ASSIN corpus, aiming to get more balanced data to improve the results of their models. Other strategies that make use of synthetic data suffer from criticism on the quality of the generated data To fulfill these gaps and explore other approaches for paraphrase detection, in this paper, inspired by Sousa et al [13], we model the paraphrase detection task as a heterogeneous network.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have