A Knowledge-Enhanced Pretraining Model for Commonsense Story Generation

Jian Guan,Xiaoyan Zhu,Minlie Huang,Zhihao Zhao,Fei Huang

doi:10.1162/tacl_a_00302

Abstract

Story generation, namely, generating a reasonable story from a leading context, is an important but challenging task. In spite of the success in modeling fluency and local coherence, existing neural language generation models (e.g., GPT-2) still suffer from repetition, logic conflicts, and lack of long-range coherence in generated stories. We conjecture that this is because of the difficulty of associating relevant commonsense knowledge, understanding the causal relationships, and planning entities and events with proper temporal order. In this paper, we devise a knowledge-enhanced pretraining model for commonsense story generation. We propose to utilize commonsense knowledge from external knowledge bases to generate reasonable stories. To further capture the causal and temporal dependencies between the sentences in a reasonable story, we use multi-task learning, which combines a discriminative objective to distinguish true and fake stories during fine-tuning. Automatic and manual evaluation shows that our model can generate more reasonable stories than state-of-the-art baselines, particularly in terms of logic and global coherence.

Highlights

Story generation is a strong indicator of machine understanding of natural language
We propose a knowledge-enhanced pretraining model for commonsense story generation by extending GPT-2 with external commonsense knowledge
Our model outperforms the variants of GPT-2 in terms of perplexity, and has higher BLEU scores than all the baselines, indicating better fluency and more overlaps with the reference stories

Summary

Introduction

Story generation is a strong indicator of machine understanding of natural language. It is often approached as selecting a sequence of events to form a story with a reasonable logic or plot. Pretrained GPT-2 has been shown to capture useful semantic and syntactic features (Alt et al, 2019), as demonstrated by state-of-theart performance on some generation tasks such as machine translation and text summarization (Radford et al, 2019). Compared with such tasks whose source inputs have contained sufficient information to generate desired target texts, story generation is a typical openended generation task, where only very limited information is given in the input.

Methods

Results

Conclusion