Abstract

Long Short-Term Memory (LSTM) is a powerful neural network algorithm that has been shown to provide state-of-the-art performance in various sequence learning tasks, including natural language processing, video classification, and speech recognition. Once an LSTM model has been trained on a dataset, the utility it provides comes from its ability to then infer information from completely new data. Due to the large complexity of LSTM models, the inference stage of LSTM can require significant computing power and memory resources in order to keep up with a real-time workload. Many approaches have been taken to accelerate inference, from offloading computations to GPUs or other specialized hardware, to reducing the number of computations and memory footprint required by compressing model parameters. This work takes a two-pronged approach to accelerating LSTM inference. First, a model compression scheme called binarization is identified to both reduce the storage size of model parameters and to simplify computations. This technique is applied to training LSTM for two separate sequence learning tasks, and it is shown to provide prediction performance comparable to the uncompressed model counterparts. Then, a digital processor architecture, called Binary Recurrent Unit (BRU), is proposed to accelerate inference for binarized LSTM models. Specifically targeted for FPGA implementation, this accelerator takes advantage of binary model weights and on-chip memory resources in order to parallelize LSTM inference computations. The BRU architecture is implemented and tested on a Xilinx Z7020 device clocked at 200 MHz. Inference computation time for BRU is evaluated against the performance of CPU and GPU inference implementations. BRU is shown to outperform CPU and GPU by as much as 39 times and 3.8 times, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.