Abstract

This paper presents an approximate computing method of long short-term memory (LSTM) operations for energy-efficient end-to-end speech recognition. We newly introduce the concept of similarity score, which can measure how much the inputs of two adjacent LSTM cells are similar to each other. Then, we disable the highly-similar LSTM operations and directly transfer the prior results for reducing the computational costs of speech recognition. The pseudo-LSTM operation is additionally defined for providing the approximate computation with reduced processing resolution, which can further relax the processing overheads without degrading the accuracy. In order to verify the proposed idea, in addition, we design an approximate LSTM accelerator in 65 nm CMOS process. The proposed accelerator newly utilizes a number of approximate processing elements (PEs) to support the proposed skipped-LSTM and pseudo-LSTM operations without degrading the energy efficiency. Moreover, sparsity-aware scheduling is introduced by introducing the small-sized on-chip SRAM buffer. As a result, the proposed work provides an energy-efficient but still accurate speech recognition system, which consumes 2.19 times less energy than the baseline architecture.

Highlights

  • In the last few years, speech recognition using deep neural networks (DNNs) has become an important challenge in the internet-of-things (IoT) communities as well as the artificial intelligence (AI) industries [1].In general, to handle sequential data such as voice signals, it is well known that the recurrent neural network (RNN) is suitable to improve the recognition accuracy by activating the prior histories during the inference processing [2]

  • In order to further relax the computational costs in the algorithm level, we present novel approximate computing of long short-term memory (LSTM) cell when the LSTM inputs are relatively similar but not enough to disable their cell operations

  • Each processing element is nothing but a MAC operator, which flexibly supports two different computing resolutions according to the LSTM processing mode delivered from the similarity check module

Read more

Summary

Introduction

In the last few years, speech recognition using deep neural networks (DNNs) has become an important challenge in the internet-of-things (IoT) communities as well as the artificial intelligence (AI) industries [1]. To handle sequential data such as voice signals, it is well known that the recurrent neural network (RNN) is suitable to improve the recognition accuracy by activating the prior histories during the inference processing [2]. The proposed work reduces computational complexity through an approximate computing approach that focuses on complicated networks utilizing multiple bits to represent weight values, which normally provide more accurate accuracy than BNN models. Starting from our prior work in [14], this paper presents an energy-efficient LSTM accelerator design that performs the approximate LSTM operations based on the similarity-based cell activation. The skipping method is applied by limiting the number of consecutive disabled cells, remarkably reducing the number of activated cells during the end-to-end speech recognition with acceptable accuracy.

DNN-Based End-to-End Speech Recognition
DeepSpeech Network and LSTM Operations
Similarity-Based LSTM Operation
Cell-Skipping Method Using Similarity Score
Pseudo-Skipping Method for Approximate LSTM Operations
Accelerator Design for Approximate LSTM Processing
Algorithm-Level Performance
Prototype Implementation
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call