Abstract

Automatic text summarization is a method used to compress documents while preserving the main idea of the original text, including extractive summarization and abstractive summarization. Extractive text summarization extracts important sentences from the original document to serve as the summary. The document representation method is crucial for the quality of the generated summarization. To effectively represent the document, we propose a hierarchical document representation model Long-Trans-Extr for Extractive Summarization, which uses Longformer as the sentence encoder and Transformer as the document encoder. The advantage of Longformer as sentence encoder is that the model can input long document up to 4096 tokens with adding relative a little calculation. The proposed model Long-Trans-Extr is evaluated on three benchmark datasets: CNN (Cable News Network), DailyMail, and the combined CNN/DailyMail. It achieves 43.78 (Rouge-1) and 39.71 (Rouge-L) on CNN/DailyMail and 33.75 (Rouge-1), 13.11 (Rouge-2), and 30.44 (Rouge-L) on the CNN datasets. They are very competitive results, and furthermore, they show that our model has better performance on long documents, such as the CNN corpus.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.