Semantic Self-Segmentation for Abstractive Summarization of Long Documents in Low-Resource Regimes

Gianluca Moro,Luca Ragazzi

doi:10.1609/aaai.v36i10.21357

Abstract

The quadratic memory complexity of transformers prevents long document summarization in low computational resource scenarios. State-of-the-art models need to apply input truncation, thus discarding and ignoring potential summary-relevant contents, leading to a performance drop. Furthermore, this loss is generally destructive for semantic text analytics in high-impact domains such as the legal one. In this paper, we propose a novel semantic self-segmentation (Se3) approach for long document summarization to address the critical problems of low-resource regimes, namely to process inputs longer than the GPU memory capacity and produce accurate summaries despite the availability of only a few dozens of training instances. Se3 segments a long input into semantically coherent chunks, allowing transformers to summarize very long documents without truncation by summarizing each chunk and concatenating the results. Experimental outcomes show the approach significantly improves the performance of abstractive summarization transformers, even with just a dozen of labeled data, achieving new state-of-the-art results on two legal datasets of different domains and contents. Finally, we report ablation studies to evaluate each contribution of the components of our method to the performance gain.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Semantic Self-Segmentation for Abstractive Summarization of Long Documents in Low-Resource Regimes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 17

Similar Papers

Implementation science and breast cancer control: A Breast Health Global Initiative (BHGI) perspective from the 2010 Global Summit
Eduardo Cazap ... Sandra R Distelhorst
The Breast | VOL. 20
Eduardo Cazap, et. al.Eduardo Cazap ... Sandra R Distelhorst
10 Mar 2011
The Breast | VOL. 20

Event extraction as machine reading comprehension with question-context bridging
Liu Liu ... Kun Ding
Knowledge-Based Systems | VOL. 299
Liu Liu, et. al.Liu Liu ... Kun Ding
05 Jun 2024
Knowledge-Based Systems | VOL. 299

Data-driven posterior features for low resource speech recognition applications
Samuel Thomas ... Hynek Hermansky
-
Samuel Thomas, et. al.Samuel Thomas ... Hynek Hermansky
09 Sep 2012
09 Sep 2012

Use of articulatory bottle-neck features for query-by-example spoken term detection in low resource scenarios
Gautam Mantena ... Kishore Prahallad
-
Gautam Mantena, et. al.Gautam Mantena ... Kishore Prahallad
01 May 2014
01 May 2014

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Semantic Self-Segmentation for Abstractive Summarization of Long Documents in Low-Resource Regimes

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence