Abstract

The abstractive method and extractive method are two main approaches for automatic document summarization. In this paper, to fully integrate the relatedness and advantages of both approaches, we propose a general unified framework for abstractive summarization which incorporates extractive summarization as an auxiliary task. In particular, our framework is composed of a shared hierarchical document encoder, a hierarchical attention mechanism-based decoder, and an extractor. We adopt multi-task learning method to train these two tasks jointly, which enables the shared encoder to better capture the semantics of the document. Moreover, as our main task is abstractive summarization, we constrain the attention learned in the abstractive task with the labels of the extractive task to strengthen the consistency between the two tasks. Experiments on the CNN/DailyMail dataset demonstrate that both the auxiliary task and the attention constraint contribute to improve the performance significantly, and our model is comparable to the state-of-the-art abstractive models. In addition, we cut half number of labels of the extractive task, pretrain the extractor, and jointly train the two tasks using the estimated sentence salience of the extractive task to constrain the attention of the abstractive task. The results do not decrease much compared with using full-labeled data of the auxiliary task.

Highlights

  • Automatic document summarization has been studied for decades

  • The neural attentional abstractive summarization model was first applied in sentence compression [36], where the input sequence is encoded by a convolutional network and the output sequence is decoded by a standard feedforward Neural Network Language Model (NNLM)

  • We have presented a sequence-to-sequence model with hierarchical document encoder and hierarchical attention for abstractive summarization, and incorporated extractive summarization as an auxiliary task

Read more

Summary

Introduction

Automatic document summarization has been studied for decades. The target of it is to generate a shorter passage from the document in a grammatically and logically coherent way, preserving the important information. It is able to capture both the local and global semantics of a document, resulting in better feature learning It can improve the training efficiency as the computational complexity of the RNN-based model can be reduced by dividing the long document into short sentences. Multi-task learning method has been successfully applied in a wide range of tasks across computer vision [18], speech recognition [12] and natural language processing [11] It improves generalization by leveraging the domain-specific information contained in the training signals of related tasks [5]. The extractor and the decoder are jointly trained which can capture better semantics of the document As both the sentence salience scores in the extractor and the sentence-level attention in the decoder indicate the importance of source sentences, we constrain the learned sentence-level attention [Fig. 1(4)] with the sentence salience scores of the extractor in order to strengthen their consistency.

Neural Summarization Model
Shared Hierarchical Document Encoder
Sentence Extractor
Decoder
Hierarchical Attention
Multi‐Task Learning
Dataset
Implementation Details
Evaluation of Proposed Components
Comparison with Baselines
Method
Discussions
Case Study
Related Work
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call