Exploring Bilingual Parallel Corpora for Syntactically Controllable Paraphrase Generation

Mingtong Liu,Deyi Xiong,Jinan Xu,Chen Sheng,Erguang Yang,Yujie Zhang,Changjian Hu,Yufeng Chen

doi:10.24963/ijcai.2020/547

Abstract

Paraphrase generation is of great importance to many downstream tasks in natural language processing. Recent efforts have focused on generating paraphrases in specific syntactic forms, which, generally, heavily relies on manually annotated paraphrase data that is not easily available for many languages and domains. In this paper, we propose a novel end-to-end framework to leverage existing large-scale bilingual parallel corpora to generate paraphrases under the control of syntactic exemplars. In order to train one model over the two languages of parallel corpora, we embed sentences of them into the same content and style spaces with shared content and style encoders using cross-lingual word embeddings. We propose an adversarial discriminator to disentangle the content and style space, and employ a latent variable to model the syntactic style of a given exemplar in order to guide the two decoders for generation. Additionally, we introduce cycle and masking learning schemes to efficiently train the model. Experiments and analyses demonstrate that the proposed model trained only on bilingual parallel data is capable of generating diverse paraphrases with desirable syntactic styles. Fine-tuning the trained model on a small paraphrase corpus makes it substantially outperform state-of-the-art paraphrase generation models trained on a larger paraphrase dataset.

Full Text