Memory degradation induced by attention in recurrent neural architectures

Mykola Harvat,José D Martín-Guerrero

doi:10.1016/j.neucom.2022.06.056

Mykola Harvat, José D Martín-Guerrero

Open Access

https://doi.org/10.1016/j.neucom.2022.06.056

Copy DOI

Journal: Neurocomputing	Publication Date: Jun 27, 2022
Citations: 2	License type: cc-by-nc-nd

Affiliation: University of Valencia

Abstract

This paper studies the memory mechanisms in recurrent neural architectures when attention models are included. Pure-attention models like Transformers are more and more popular as they tend to outperform models with recurrent connections in many different tasks. Our conjecture is that attention prevents the recurrent connections from transferring information properly between consecutive next steps. This conjecture is empirically tested using five different models, namely, a model without attention, a standard Loung attention model, a standard Bahdanau attention model, and our proposal to add attention to the inputs in order to fill the gap between recurrent and parallel architectures (for both Luong and Bahdanau attention models). Eight different problems are considered to assess the five models: a sequence-reverse copy problem, a sequence-reverse copy problem with repetitions, a filter sequence problem, a sequence-reverse copy problem with bigrams and four translation problems (English to Spanish, English to French, English to German and English to Italian). The achieved results reinforce our conjecture on the interaction between attention and recurrence.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Memory degradation induced by attention in recurrent neural architectures

Abstract

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Recurrent Neural Architecture Search based on Randomness-Enhanced Tabu Algorithm
Kai Hu ... Nan Li
-
Kai Hu, et. al.Kai Hu ... Nan Li
01 Jul 2020
01 Jul 2020

Using Regular Languages to Explore the Representational Capacity of Recurrent Neural Architectures
Abhijit Mahalunkar ... John D Kelleher
-
Abhijit Mahalunkar, et. al.Abhijit Mahalunkar ... John D Kelleher
01 Jan 2018
01 Jan 2018

Deep Learning Methods for Improved Decoding of Linear Codes
Eliya Nachmani ... Loren Lugosch
IEEE Journal of Selected Topics in Signal Processing | VOL. 12
Eliya Nachmani, et. al.Eliya Nachmani ... Loren Lugosch
01 Jan 2018
IEEE Journal of Selected Topics in Signal Processing | VOL. 12

A nonlinear manifold learning strategy for lighting and orientation invariant pattern recognition
Vijayan K Asari
-
Vijayan K AsariVijayan K Asari
01 Nov 2013
01 Nov 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Memory degradation induced by attention in recurrent neural architectures

Abstract

Talk to us

Similar Papers

More From: Neurocomputing