Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models

Nesma M Rezk,Zain Ul-Abdin,Tomas Nordström

doi:10.3390/info13040176

Abstract

Recurrent neural networks (RNNs) are neural networks (NN) designed for time-series applications. There is a growing interest in running RNNs to support these applications on edge devices. However, RNNs have large memory and computational demands that make them challenging to implement on edge devices. Quantization is used to shrink the size and the computational needs of such models by decreasing weights and activation precision. Further, the delta networks method increases the sparsity in activation vectors by relying on the temporal relationship between successive input sequences to eliminate repeated computations and memory accesses. In this paper, we study the effect of quantization on LSTM-, GRU-, LiGRU-, and SRU-based RNN models for speech recognition on the TIMIT dataset. We show how to apply post-training quantization on these models with a minimal increase in the error by skipping quantization of selected paths. In addition, we show that the quantization of activation vectors in RNNs to integer precision leads to considerable sparsity if the delta networks method is applied. Then, we propose a method for increasing the sparsity in the activation vectors while minimizing the error and maximizing the percentage of eliminated computations. The proposed quantization method managed to compress the four models more than 85%, with an error increase of 0.6, 0, 2.1, and 0.2 percentage points, respectively. By applying the delta networks method to the quantized models, more than 50% of the operations can be eliminated, in most cases with only a minor increase in the error. Comparing the four models to each other under the quantization and delta networks method, we found that compressed LSTM-based models are the most-optimum solutions at low-error-rates constraints. The compressed SRU-based models are the smallest in size, suitable when higher error rates are acceptable, and the compressed LiGRU-based models have the highest number of eliminated operations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Information	Publication Date: Mar 31, 2022
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models

Abstract

Talk to us

Similar Papers

More From: Information

Lead the way for us

Similar Papers

On RNN Models for Solving Dynamic System of Linear Equations
Huiyan Lu ... Jiliang Zhang
-
Huiyan Lu, et. al.Huiyan Lu ... Jiliang Zhang
01 Dec 2019
01 Dec 2019

Sequence-based statistical downscaling and its application to hydrologic simulations based on machine learning and big data
Qingrui Wang ... Xinghui Xia
Journal of Hydrology | VOL. 586
Qingrui Wang, et. al.Qingrui Wang ... Xinghui Xia
21 Mar 2020
Journal of Hydrology | VOL. 586

MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks
Nesma M Rezk ... Ahmed Hemani
Journal of Systems Architecture | VOL. 133
Nesma M Rezk, et. al.Nesma M Rezk ... Ahmed Hemani
04 Nov 2022
Journal of Systems Architecture | VOL. 133

Is the LSTM Model Better than RNN for Flood Forecasting Tasks? A Case Study of HuaYuankou Station and LouDe Station in the Lower Yellow River Basin
Yiyang Wang ... Dongmei Xu
Water | VOL. 15
Yiyang Wang, et. al.Yiyang Wang ... Dongmei Xu
10 Nov 2023
Water | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Shrink and Eliminate: A Study of Post-Training Quantization and Repeated Operations Elimination in RNN Models

Abstract

Talk to us

Similar Papers

More From: Information