Abstract

Recurrent Neural Networks (RNN) have demonstrated excellent results for various Automatic Speech Recognition (ASR) and Natural Language Processing (NLP) tasks. However, executing RNNs requires huge memory and computations which makes it difficult to achieve real time performance on low power devices like smartphones. Hence, currently ASR and NLP applications such as voice assistants are using cloud based solutions. In this paper, to enable on-device inference, we propose efficient approximations for weights of FC layers and activation functions to reduce the computational complexity. The proposed approximations eliminate multiplications, divisions and exponential operations by replacing them with simple arithmetic operations (shifts, additions) to significantly reduce the computation requirements without any perceivable loss of functional accuracy. The approximations also reduce the memory size and bandwidth requirements. We also present a lightweight VLIW based DSP architecture with these approximations to enable on-device inference. The approximations have been tested on the proposed DSP with various RNN applications like EESEN, LRCN and S2VT. The results with approximations show — accuracies similar to that of float (32-bit) reference, ∼ 8x–12× performance gains, ∼ 2x–4x gains in memory requirement and bandwidth. Moreover, the activation approximation results show better average and peak errors compared to the State of the Art.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call