Introducing bidirectional attention for autoregressive models in abstractive summarization

Jianfei Zhao,Xin Sun,Chong Feng

doi:10.1016/j.ins.2024.121497

Abstract

Abstractive summarization methods typically follow the autoregressive paradigm using the causal masks in the decoder for training and inference efficiency. However, this approach leads to a constant context throughout the generation process, which conflicting with the bidirectional characteristics of natural language. Although previous attempts have been made to incorporate bidirectional attention in the decoding process through non-autoregressive approach, the evaluation results are not comparable to the autoregressive methods. To bring bidirectional attention to the autoregressive process while maintaining superior performance, we propose the global autoregressive paradigm, which takes the outputs of the autoregressive process as additional inputs in the subsequent global iteration. Specifically, we build a bidirectional decoder alongside the original encoder and decoder to capture the bidirectional context of the outputs. This context is updated after each autoregressive decoding iteration. The decoder then integrates the updated context into subsequent autoregressive decoding steps, enhancing the generative process with a more comprehensive and authentic context. Additionally, we use contrastive learning to train the model to extract reliable features from the bidirectional context and apply reinforcement learning to improve the model's utilization of this context. We evaluate our method on CNN/DM, XSum, and NYT datasets, and the results highlight the significance of the bidirectional context. Our method achieves the best performance in terms of ROUGE-2 on CNN/DM (23.96), and performs comparably on XSum (25.45) and NYT (27.91). It also outperforms all the baselines in terms of BERTScore, with a score of 89.96 on CNN/DM, 92.70 on XSum, and 90.04 on NYT. Furthermore, our method can perform better with a larger beam size.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Introducing bidirectional attention for autoregressive models in abstractive summarization

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Similar Papers

Auxiliary Guided Autoregressive Variational Autoencoders
Thomas Lucas ... Jakob Verbeek
-
Thomas Lucas, et. al.Thomas Lucas ... Jakob Verbeek
01 Jan 2019
01 Jan 2019

Scene Text Recognition with Permuted Autoregressive Sequence Models
Darwin Bautista ... Rowel Atienza
-
Darwin Bautista, et. al.Darwin Bautista ... Rowel Atienza
01 Jan 2021
01 Jan 2021

Extraction of generative processes from B-Rep shapes and application to idealization transformations
Flavien Boussuge ... Lionel Fine
Computer-Aided Design | VOL. 46
Flavien Boussuge, et. al.Flavien Boussuge ... Lionel Fine
30 Aug 2013
Computer-Aided Design | VOL. 46

A Masked Segmental Language Model for Unsupervised Natural Language Segmentation
C.M Downey ... Gina-Anne Levow
-
C.M Downey, et. al.C.M Downey ... Gina-Anne Levow
01 Jan 2021
01 Jan 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Introducing bidirectional attention for autoregressive models in abstractive summarization

Abstract

Talk to us

Similar Papers

More From: Information Sciences