Abstract

Recurrent Neural Networks (RNNs) have emerged as one of the most popular neural networks for processing time-series problems, widely used in machine translation, automatic speech recognition, and other natural language processing applications. However, conventional RNNs suffered from vanishing and exploding gradients, resulting in poor network performance in applications with long-term input information. As a variant of RNN, Long Short-Term Memory (LSTM) had been proposed to tackle this issue. Nevertheless, at the same time, LSTM introduces gating units and many additional parameters, which makes it challenging to be implemented directly on resource-limited platforms, such as Field Programmable Gate Arrays (FPGAs). This work first investigated the overall maximum achievable compression rates of different gating units and their correlations. Then, Gating Units Level Balanced Compression (GBC) strategy is proposed. After Top- <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$k$ </tex-math></inline-formula> pruning, the proposed GBC strategy can attain a compression rate of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$36.6\times $ </tex-math></inline-formula> for LSTM. Further, the theoretical analysis indicates that for the existing gating units level LSTM compression variants, the GBC strategy still has further potential for compression. A complementary compression of the GBC strategy is performed on the existing coupled-gate LSTM to verify the analysis. Experimental results show that GBC achieves an additional <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$32\times $ </tex-math></inline-formula> (overall <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$42.7\times $ </tex-math></inline-formula> ) compression rate with negligible accuracy loss. Finally, hardware experiments conducted on Xilinx ADM-PCIE-7V3 FPGAs also demonstrate that the accelerator designed in this paper achieves an improvement of 7.4%-191.5% in energy efficiency compared to the state-of-the-art designs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call