Augmenting Paraphrase Generation with Syntax Information Using Graph Convolutional Networks.

Xiaoqiang Chi,Yang Xiang

doi:10.3390/e23050566

Abstract

Paraphrase generation is an important yet challenging task in natural language processing. Neural network-based approaches have achieved remarkable success in sequence-to-sequence learning. Previous paraphrase generation work generally ignores syntactic information regardless of its availability, with the assumption that neural nets could learn such linguistic knowledge implicitly. In this work, we make an endeavor to probe into the efficacy of explicit syntactic information for the task of paraphrase generation. Syntactic information can appear in the form of dependency trees, which could be easily acquired from off-the-shelf syntactic parsers. Such tree structures could be conveniently encoded via graph convolutional networks to obtain more meaningful sentence representations, which could improve generated paraphrases. Through extensive experiments on four paraphrase datasets with different sizes and genres, we demonstrate the utility of syntactic information in neural paraphrase generation under the framework of sequence-to-sequence modeling. Specifically, our graph convolutional network-enhanced models consistently outperform their syntax-agnostic counterparts using multiple evaluation metrics.

Highlights

We provide a brief description of two component models that are necessary to understand the method proposed in this work.2.1
We propose to encode syntax information for paraphrase generation via graph convolutional networks
By stacking graph convolutional networks (GCNs) on top of recurrent networks on the encoder side of our paraphrasing model, we achieved significantly better results on four paraphrase datasets of varying sizes and genres

Summary

Introduction

We provide a brief description of two component models that are necessary to understand the method proposed in this work.2.1. A source sentence X = ( x1 , ..., x J ) is fed into the encoder to give a fixedlength hidden vector, presumably encoding the “meaning” of the sentence. This hidden vector is a continuous representation often referred to as the context vector. The target sentence Y = (y1 , ..., y I ) is produced one symbol at a time, conditioning on the context vector provided by the encoder, where I is not necessarily equal to J.

Methods

Results

Discussion

Conclusion