Accelerating Inference In Long Short-Term Memory Neural Networks

Thomas Mealey,Tarek M Taha

doi:10.1109/naecon.2018.8556674

Abstract

Long Short-Term Memory (LSTM) is a powerful neural network algorithm that has been shown to provide state-of-the-art performance in various sequence learning tasks, including natural language processing, video classification, and speech recognition. Once an LSTM model has been trained on a dataset, the utility it provides comes from its ability to then infer information from completely new data. Due to the large complexity of LSTM models, the inference stage of LSTM can require significant computing power and memory resources in order to keep up with a real-time workload. Many approaches have been taken to accelerate inference, from offloading computations to GPUs or other specialized hardware, to reducing the number of computations and memory footprint required by compressing model parameters. This work takes a two-pronged approach to accelerating LSTM inference. First, a model compression scheme called binarization is identified to both reduce the storage size of model parameters and to simplify computations. This technique is applied to training LSTM for two separate sequence learning tasks, and it is shown to provide prediction performance comparable to the uncompressed model counterparts. Then, a digital processor architecture, called Binary Recurrent Unit (BRU), is proposed to accelerate inference for binarized LSTM models. Specifically targeted for FPGA implementation, this accelerator takes advantage of binary model weights and on-chip memory resources in order to parallelize LSTM inference computations. The BRU architecture is implemented and tested on a Xilinx Z7020 device clocked at 200 MHz. Inference computation time for BRU is evaluated against the performance of CPU and GPU inference implementations. BRU is shown to outperform CPU and GPU by as much as 39 times and 3.8 times, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Accelerating Inference In Long Short-Term Memory Neural Networks

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A Novel Hybrid Deep Neural Network to Predict Pre-impact Fall for Older People Based on Wearable Inertial Sensors.
Xiaoqun Yu ... Hai Qiu
Frontiers in Bioengineering and Biotechnology | VOL. 8
Xiaoqun Yu, et. al.Xiaoqun Yu ... Hai Qiu
12 Feb 2020
Frontiers in Bioengineering and Biotechnology | VOL. 8

Evaluation of Cryptocurrency Price Prediction Using LSTM and CNNs Models
Ng Shi Wen ... Lew Sook Ling
JOIV : International Journal on Informatics Visualization | VOL. 7
Ng Shi Wen, et. al.Ng Shi Wen ... Lew Sook Ling
30 Nov 2023
JOIV : International Journal on Informatics Visualization | VOL. 7

Forecasting of Total Column Ozone using Regression Analysis and LSTM-RNN Machine Learning Approach
T M Anu ... K Elampari
Indian Journal Of Science And Technology | VOL. 15
T M Anu, et. al.T M Anu ... K Elampari
05 Aug 2022
Indian Journal Of Science And Technology | VOL. 15

Strata-Constrained GWLSTM Network for Logging Lithology Prediction
Jianjun Li ... Baohai Wu
-
Jianjun Li, et. al.Jianjun Li ... Baohai Wu
10 Jun 2023
10 Jun 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating Inference In Long Short-Term Memory Neural Networks

Abstract

Talk to us

Similar Papers