Simple Recurrent Units for Highly Parallelizable Recurrence

Tao Lei,Hui Dai,Sida I Wang,Yoav Artzi,Yu Zhang

doi:10.18653/v1/d18-1477

Tao Lei, Hui Dai + Show 3 more

Open Access

PDF Available

https://doi.org/10.18653/v1/d18-1477

Copy DOI

Export

Save

Cite

Publication Date: Jan 1, 2018
Citations: 204	License type: cc-by

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Common recurrent neural architectures scale poorly due to the intrinsic difficulty in parallelizing their state computations. In this work, we propose the Simple Recurrent Unit (SRU), a light recurrent unit that balances model capacity and scalability. SRU is designed to provide expressive recurrence, enable highly parallelized implementation, and comes with careful initialization to facilitate training of deep models. We demonstrate the effectiveness of SRU on multiple NLP tasks. SRU achieves 5—9x speed-up over cuDNN-optimized LSTM on classification and question answering datasets, and delivers stronger results than LSTM and convolutional models. We also obtain an average of 0.7 BLEU improvement over the Transformer model (Vaswani et al., 2017) on translation by incorporating SRU into the architecture.

Highlights

On the movie review (MR) dataset for instance, Simple Recurrent Unit (SRU) completes 100 training epochs within 40 seconds, while Long Short-term Memory (LSTM) takes over 320 seconds
SRU exhibits over 5x speed-up over LSTM and 53–63% reduction in total training time
Our 5layer model obtains an average improvement of 0.7 test BLEU score and an improvement of 0.5 BLEU score by comparing the best results of each model achieved across three runs

Summary

Introduction

Recurrent neural networks (RNN) are at the core of state-of-the-art approaches for a large number of natural language tasks, including machine translation (Cho et al, 2014; Bahdanau et al, 2015; Jean et al, 2015; Luong et al, 2015), language modeling (Zaremba et al, 2014; Gal and Ghahramani, 2016; Zoph and Le, 2016), opinion mining (Irsoy and Cardie, 2014), and situated language understanding (Mei et al, 2016; Misra et al, 2017; Suhr et al, 2018; Suhr and Artzi, 2018). The difficulty of scaling recurrent networks arises from the time dependence of state computation In common architectures, such as Long Short-term Memory (LSTM; Hochreiter and Schmidhuber, 1997) and Gated Recurrent Units (GRU; Cho et al, 2014), the computation of each step is suspended until the complete execution of the previous step. This sequential dependency makes recurrent networks significantly slower than other operations, and limits their applicability. SRU replaces the use of convolutions (i.e., ngram filters), as in QRNN and KNN, with more recurrent connections This retains modeling capacity, while using less computation (and hyper-parameters). We obtain an average improvement of 0.7 BLEU score on the English to German translation task by incorporating SRU into Transformer (Vaswani et al, 2017)

Related Work

Simple Recurrent Unit

Parallelized Implementation

Initialization

Experiments

Text Classification

Results

Question Answering

Machine Translation

Table 3

Ablation Analysis

Discussion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Simple Recurrent Units for Highly Parallelizable Recurrence

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Video anomaly detection system using deep convolutional and recurrent models
Maryam Qasim ... Elena Verdu
Results in Engineering | VOL. 18
Maryam Qasim, et. al.Maryam Qasim ... Elena Verdu
23 Mar 2023
Results in Engineering | VOL. 18

Ultra-Short-Term Photovoltaic Power Generation Prediction Based on Hunter–Prey Optimized K-Nearest Neighbors and Simple Recurrent Unit
Yin Tang ... Sha Yang
Applied Sciences | VOL. 14
Yin Tang, et. al.Yin Tang ... Sha Yang
05 Mar 2024
Applied Sciences | VOL. 14

Utterance-Level Sequential Modeling for Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Tomoki Koriyama ... Hiroshi Saruwatari
-
Tomoki Koriyama, et. al.Tomoki Koriyama ... Hiroshi Saruwatari
11 Apr 2020
11 Apr 2020

Robust automatic modulation classification based on convolutional and recurrent fusion network
Zhichao Lyu ... Guan Gui
Physical Communication | VOL. 43
Zhichao Lyu, et. al.Zhichao Lyu ... Guan Gui
24 Sep 2020
Physical Communication | VOL. 43

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Simple Recurrent Units for Highly Parallelizable Recurrence

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers