Abstract

Unsupervised extractive summarization aims to extract salient sentences from the document without labeled corpus. Existing methods have achieved promising progress, thanks to the power of large-scale pre-trained language models and high-quality contextualized representations. However, extractive summaries often fail to maintain smooth transitions between sentences and struggle to form a coherent and fluent text due to splicing of sentences. Nevertheless, to the best of our knowledge, very few studies currently focus on unsupervised abstractive summarization. Inspired by the intuitive human process of writing summaries, which involves extracting salient sentences first and then reconstructing them, in this paper, we propose an Extract-then-Abstract framework to generate more coherent and human-like summary. Specifically, we first adopt extractive summarization model as summarizer to generate extractive summary in the extraction stage. Then in the abstraction stage, we propose a BART-based sentence write model to generate more coherent and fluent abstractive summary. To this end, we design a novel parallel data creation method for our rewrite model by proposing an effective sentence sampling strategy without any manual annotation cost. Extensive experiments including automatic evaluation and human evaluation demonstrate that our framework consistently outperforms strong baselines for unsupervised abstractive summarization and can generate more coherent and human-like summary while maintaining in competitive ROUGE scores for unsupervised extractive summarization.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.