V-LSTM: An Efficient LSTM Accelerator Using Fixed Nonzero-Ratio Viterbi-Based Pruning

Taesu Kim,Dongsoo Lee,Daehyun Ahn,Jae-Joon Kim

doi:10.1109/tcad.2023.3243879

Abstract

Long Short-Term Memory (LSTM) has been widely adopted in tasks with sequence data, such as speech recognition and language modeling. LSTM brought significant accuracy improvement by introducing additional parameters to Recurrent Neural Network (RNN). However, increasing number of parameters and computations also led to inefficiency in computing LSTM on edge devices with limited on-chip memory size and DRAM bandwidth. In order to reduce the latency and energy of LSTM computations, there has been a pressing need for model compression schemes and suitable hardware accelerators. In this paper, we first propose the Fixed Nonzero-ratio Viterbi-based Pruning, which can reduce the memory footprint of LSTM models by 96% with negligible accuracy loss. By applying additional constraints on the distribution of surviving weights in Viterbi-based Pruning, the proposed pruning scheme mitigates the load-imbalance problem and thereby increases the processing engine utilization rate. Then, we propose the V-LSTM, an efficient sparse LSTM accelerator based on the proposed pruning scheme. High compression ratio of the proposed pruning scheme allows the proposed accelerator to achieve 24.9% lower per-sample latency than that of state-of-the-art accelerators. The proposed accelerator is implemented on Xilinx VC-709 FPGA evaluation board running at 200MHz for evaluation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

V-LSTM: An Efficient LSTM Accelerator Using Fixed Nonzero-Ratio Viterbi-Based Pruning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Lead the way for us

Journal: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems	Publication Date: Oct 1, 2023
Citations: 3

Similar Papers

V-LSTM: An Efficient LSTM Accelerator Using Fixed Nonzero-Ratio Viterbi-Based Pruning
Taesu Kim ... Daehyun Ahn
-
Taesu Kim, et. al.Taesu Kim ... Daehyun Ahn
23 Feb 2020
23 Feb 2020

Lattice rescoring strategies for long short term memory language models in speech recognition
Shankar Kumar ... Michael Nirschl
-
Shankar Kumar, et. al.Shankar Kumar ... Michael Nirschl
01 Dec 2017
01 Dec 2017

Comparison of Various Neural Network Language Models in Speech Recognition
Lingyun Zuo ... Xin Wan
-
Lingyun Zuo, et. al.Lingyun Zuo ... Xin Wan
01 Jul 2016
01 Jul 2016

Bidirectional recurrent neural network language models for automatic speech recognition
Ebru Arisoy ... Stanley Chen
-
Ebru Arisoy, et. al.Ebru Arisoy ... Stanley Chen
01 Apr 2015
01 Apr 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

V-LSTM: An Efficient LSTM Accelerator Using Fixed Nonzero-Ratio Viterbi-Based Pruning

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems