Abstract

Recurrent neural networks (RNNs) perform excellently on sequencing tasks but are severely restricted by the complex computations and intensive memory consumption due to their internal fully connected topologies, thereby making it a great challenge to implement RNNs on embedded devices. In this brief, we propose an energy-efficient RNN processor by exploiting the data locality in network compression using an innovative quantified sparse matrix encoding format. Compared with the conventional processors for compressed RNNs, more than 80% of the weight fetching and matrix–vector multiplications can be further reduced in applications, such as natural language and keyword spotting. To handle different scales of RNN models without introducing significant interactive overhead, scalable hardware architecture is presented to organize multiple processor engines in a spatial fashion with the assistance of the network cross-division strategy. Synthesized in the SMIC 40LL CMOS process, the prototype processor has a total area of 0.65 mm2 with 95.5 kB of static random-access memory capacity. Based on the simulation, this processor achieves a peak performance of 24 GOPS and dissipates 6.16-mW power with 1.1 V supply and 200 MHz. The peak energy efficiency reaches 3.89 GOPS/mW, which is state of the art among existing RNN accelerators.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.