Low Complex &amp; High Accuracy Computation Approximations to Enable On-Device RNN Applications

Sirish Kumar Pasupuleti,R Chandra Kumar,Raj Narayana Gadde,N Chandra Sekhar,Vasanthakumar Rajagopal,Narasinga Rao Miniskar,Ashok Vishnoi

doi:10.1109/iscas.2019.8702528

Abstract

Recurrent Neural Networks (RNN) have demonstrated excellent results for various Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) tasks. However, executing RNNs requires huge memory and computations which makes it difficult to achieve real time performance on low power devices like smartphones. Hence, currently ASR and NLP applications such as voice assistants are using cloud based solutions. In this paper, to enable on-device inference, we propose efficient approximations for weights of FC layers and activation functions to reduce the computational complexity. The proposed approximations eliminate multiplications, divisions and exponential operations by replacing them with simple arithmetic operations (shifts, additions) to significantly reduce the computation requirements without any perceivable loss of functional accuracy. The approximations also reduce the memory size and bandwidth requirements. We also present a lightweight VLIW based DSP architecture with these approximations to enable on-device inference. The approximations have been tested on the proposed DSP with various RNN applications like EESEN, LRCN and S2VT. The results with approximations show — accuracies similar to that of float (32-bit) reference, ∼ 8x–12× performance gains, ∼ 2x–4x gains in memory requirement and bandwidth. Moreover, the activation approximation results show better average and peak errors compared to the State of the Art.

Full Text