Abstract
This paper presents an approximate computing method of long short-term memory (LSTM) operations for energy-efficient end-to-end speech recognition. We newly introduce the concept of similarity score, which can measure how much the inputs of two adjacent LSTM cells are similar to each other. Then, we disable the highly-similar LSTM operations and directly transfer the prior results for reducing the computational costs of speech recognition. The pseudo-LSTM operation is additionally defined for providing the approximate computation with reduced processing resolution, which can further relax the processing overheads without degrading the accuracy. In order to verify the proposed idea, in addition, we design an approximate LSTM accelerator in 65 nm CMOS process. The proposed accelerator newly utilizes a number of approximate processing elements (PEs) to support the proposed skipped-LSTM and pseudo-LSTM operations without degrading the energy efficiency. Moreover, sparsity-aware scheduling is introduced by introducing the small-sized on-chip SRAM buffer. As a result, the proposed work provides an energy-efficient but still accurate speech recognition system, which consumes 2.19 times less energy than the baseline architecture.
Highlights
In the last few years, speech recognition using deep neural networks (DNNs) has become an important challenge in the internet-of-things (IoT) communities as well as the artificial intelligence (AI) industries [1].In general, to handle sequential data such as voice signals, it is well known that the recurrent neural network (RNN) is suitable to improve the recognition accuracy by activating the prior histories during the inference processing [2]
In order to further relax the computational costs in the algorithm level, we present novel approximate computing of long short-term memory (LSTM) cell when the LSTM inputs are relatively similar but not enough to disable their cell operations
Each processing element is nothing but a MAC operator, which flexibly supports two different computing resolutions according to the LSTM processing mode delivered from the similarity check module
Summary
In the last few years, speech recognition using deep neural networks (DNNs) has become an important challenge in the internet-of-things (IoT) communities as well as the artificial intelligence (AI) industries [1]. To handle sequential data such as voice signals, it is well known that the recurrent neural network (RNN) is suitable to improve the recognition accuracy by activating the prior histories during the inference processing [2]. The proposed work reduces computational complexity through an approximate computing approach that focuses on complicated networks utilizing multiple bits to represent weight values, which normally provide more accurate accuracy than BNN models. Starting from our prior work in [14], this paper presents an energy-efficient LSTM accelerator design that performs the approximate LSTM operations based on the similarity-based cell activation. The skipping method is applied by limiting the number of consecutive disabled cells, remarkably reducing the number of activated cells during the end-to-end speech recognition with acceptable accuracy.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have