A Learning-Exploring Method to Generate Diverse Paraphrases with Multi-Objective Deep Reinforcement Learning

Mingtong Liu,Yao Meng,Deyi Xiong,Changjian Hu,Yujie Zhang,Jinan Xu,Erguang Yang,Yufeng Chen

doi:10.18653/v1/2020.coling-main.209

Abstract

Paraphrase generation (PG) is of great importance to many downstream tasks in natural language processing. Diversity is an essential nature to PG for enhancing generalization capability and robustness of downstream applications. Recently, neural sequence-to-sequence (Seq2Seq) models have shown promising results in PG. However, traditional model training for PG focuses on optimizing model prediction against single reference and employs cross-entropy loss, which objective is unable to encourage model to generate diverse paraphrases. In this work, we present a novel approach with multi-objective learning to PG. We propose a learning-exploring method to generate sentences as learning objectives from the learned data distribution, and employ reinforcement learning to combine these new learning objectives for model training. We first design a sample-based algorithm to explore diverse sentences. Then we introduce several reward functions to evaluate the sampled sentences as learning signals in terms of expressive diversity and semantic fidelity, aiming to generate diverse and high-quality paraphrases. To effectively optimize model performance satisfying different evaluating aspects, we use a GradNorm-based algorithm that automatically balances these training objectives. Experiments and analyses on Quora and Twitter datasets demonstrate that our proposed method not only gains a significant increase in diversity but also improves generation quality over several state-of-the-art baselines.

Highlights

Paraphrase generation (PG) creates different expressions that share the same meaning (e.g., “how far is Earth from Sun” and “what is the distance between Sun and Earth”)
In order to enable the model to learn to generate diverse paraphrases, we propose to equip the model with several vital components: (1) sample-based exploring algorithm to generate diverse candidate paraphrases; (2) multiple reward functions for evaluating sampled sentences to ensure expressive diversity and semantic fidelity simultaneously; (3) GradNorm-based algorithm that automatically balances training objectives for effective learning
A main reason is that every test case only has one reference sentence, which makes the word matching-based evaluation metric more difficult to measure the real quality of diverse paraphrases

Summary

Introduction

Paraphrase generation (PG) creates different expressions that share the same meaning (e.g., “how far is Earth from Sun” and “what is the distance between Sun and Earth”). It is a crucial technology in many downstream natural language processing (NLP) applications such as question answering (Dong et al, 2017), machine translation (Zhou et al, 2019), and text summarization (Zhao et al, 2018). We hope to generate diverse paraphrases while ensuring same meaning, which is important for enhancing generalization capability and robustness of downstream applications (Iyyer et al, 2018). These examples express the same meaning but with different diversities

Objectives

Results

Conclusion