Abstract

To exploit the high accuracy, inherent redundancy, and embarrassingly parallel nature of Convolutional Neural Networks (CNN), for intelligent embedded systems, many dedicated CNN accelerators have been presented. These accelerators are optimized to employ compression, tiling, and layer merging for a specific data flow/parallelism pattern. However, the dimension of a CNN differ widely from one application to another (and also from one layer to another). Therefore, the optimal parallelism and data flow pattern also differs significantly in different CNN layers. An efficient accelerator should have flexibility to not only efficiently support different data flow patterns but also to interleave and cascade them. To achieve this ability requires configuration overheads. This paper analyzes whether the reconfiguration overheads for interleaving and cascading multiple data flow and parallelism patterns are justified. To answer this question, we first design a reconfigurable CNN accelerator, called ReCon. ReCon is the compared with state-of-the-art accelerators. Post-layout synthesis results reveal that ReCon provides up to 2.2X higher throughput and up to 2.3X better energy efficiency at the cost of 26–35% additional area.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call