History Reuse and Bag-of-Words Loss for Long Summary Generation

Qing Liu,Lei Chen,Yuan Yuan,Huarui Wu

doi:10.1109/taslp.2021.3100281

Abstract

Recurrent Neural Network (RNN) based abstractive text summarization models have made great progress over the past few years, largely triggered by the encoder-decoder architecture. However, there has been little work improving the generation of relatively long summaries. In this paper, we concentrate on two prominent problems in long summary generation. First, although significant efforts have been made to assist the encoder in handling long sequences, the decoder struggles with long sequences owing to the limited storage capacity of RNN. We propose a simple and effective approach called history reuse, which first mines critical information from the history summary sequence and then transmits the information to the decoder. Second, since encoder-decoder models are typically trained to produce exactly the same summary as the target summary, certain word order deviations between the predicted summary and target summary are excessively punished. Accordingly, we introduce a fully differentiable loss called bag-of-words (BoW) loss, which takes advantage of the feature of BoW discarding word order information in texts, and computes the difference between the two summaries at the BoW space. Experiments on two benchmark datasets, CNN/Daily Mail and Pubmed, demonstrate that our methods significantly improve the baseline.

Full Text