In recent years, the exponential growth of scientific literature has made it increasingly difficult for researchers and practitioners to keep up with new discoveries and developments in their fields. Thanks to this, text summarization has become one of the primary tasks of natural language processing. Abstractive summarization of long documents, such as scientific articles, requires large neural networks with high memory and computation requirements. Therefore, it is all the more important to find ways to increase the efficiency of long document summarization models. The objects of this research are long document summarization transformer models and the Unlimiformer cross-attention modification. The article reviews the basic principles of transformer attention, which constitutes the primary computational expense in transformer models. More efficient self-attention approaches used for long document summarization models are described, such as the global+sliding window attention used by Longformer. The cross-attention mechanism of Unlimiformer, which allows a model to have unbounded input length, is described in detail. The objective of the study is the development and evaluation of a long document summarization model using the Unlimiformer modification. To achieve this goal, a Longformer Decoder-Encoder model pretrained on the arXiv dataset is modified with Unlimiformer cross-attention. This modification can be applied without additional model fine-tuning, avoiding the cost of further training a large sequence length model. The developed model was evaluated on the arXiv dataset using the ROUGE-1, ROUGE-2 and ROUGE-L metrics. The developed model showed improved results compared to the baseline model, demonstrating the viability of using this approach to improve long document summarization models.
Read full abstract