Abstract

Processing-in-memory (PIM) architecture based on resistive random access memory (ReRAM) crossbars is a promising solution to the memory bottleneck that long short-term memory (LSTM) faces. Based on the dataflow analysis of the LSTM computing paradigm, this article proposes to adopt the ReRAM-based analog approximate computing to conduct the LSTM-specific element-wise computation. Combined with the dot-product computation implemented with ReRAM crossbars, a new LSTM processing tile is designed to significantly reduce the demand for analog-to-digital converters (ADCs), which is the major part of power consumption of existing designs. Next, we elaborate on a mapping scheme to efficiently deploy large-scale LSTM onto multiple processing tiles. Finally, an architecture enhancement is proposed to support crossbar-friendly LSTM pruning to further improve efficiency. This overall design, named ERA-LSTM, is presented. Our evaluation shows that it can outperform two state-of-the-art FPGA-based LSTM accelerators by 103.6 and 35.9 times, respectively; compared with a state-of-the-art ReRAM-based LSTM accelerator with digital element-wise computation, it is 6.1 times more efficient. Moreover, our experiments demonstrate that the impact of hardware constraints and approximation errors on the inference accuracy can be effectively reduced by the proposed fine-tuning scheme and by optimizing the design of the approximator.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call