Abstract

Over the past two decades, Long Short-Term Memory (LSTM) networks have been used to solve problems that require modeling of long sequence because they can selectively remember certain patterns over a long period, thus outperforming traditional feed-forward neural networks and Recurrent Neural Network (RNN) on learning long-term dependencies. However, LSTM is characterized by feedback dependence, which limits the high parallelism of general-purpose processors such as CPU and GPU. Besides, in terms of the energy efficiency of data center applications, the high consumption of GPU and CPU computing cannot be ignored. To deal with the above problems, Field Programmable Gate Array (FPGA) is becoming an ideal alternative. FPGA has the characteristics of low power consumption and low latency, which are helpful for the acceleration and optimization of LSTM and other RNNs. This paper proposes an implementation scheme of the LSTM network acceleration engine based on FPGA and further optimizes the implementation through fixed-point arithmetic, systolic array and lookup table for nonlinear function. On this basis, for easy deployment and application, we integrate the proposed acceleration engine into Caffe, one of the most popular deep learning frameworks. Experimental results show that, compared with CPU and GPU, the FPGA-based acceleration engine can achieve performance improvement of 8.8 and 2.2 times and energy efficiency improvement of 16.9 and 9.6 times, respectively, within Caffe framework.

Highlights

  • As one of the most difficult problems in data science, sequence prediction, such as speech recognition [1] and language understanding [2], has been around for a long time

  • To overcome the challenges of computing and energy efficiency, we propose an FPGAbased Long Short-Term Memory (LSTM) acceleration engine and integrate it into the Caffe framework [8] to make the LSTM network easier to deploy

  • To take advantage of the convenience and flexibility of Caffe, we propose a general LSTM acceleration system based on Caffe, which is co-operated by CPU and Field Programmable Gate Array (FPGA)

Read more

Summary

Introduction

As one of the most difficult problems in data science, sequence prediction, such as speech recognition [1] and language understanding [2], has been around for a long time. With the technical breakthrough of data science, especially deep learning networks, the LSTM network [3] has gradually become an effective solution that can solve almost all sequence problems. LSTMs are widely used in many sequence modeling tasks, including many natural language processing tasks. Similar to other deep learning networks, the model size of the LSTM network is constantly increasing to improve its accuracy. In addition to the high power consumption of GPU, the inherent recurrent characteristics of LSTM become the main bottleneck of parallel processing on GPU

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.