Can a reconfigurable architecture beat ASIC as a CNN accelerator?

Syed M A H Jafri,Ahmed Hemani,Dimmitrios Stathis

doi:10.1109/samos.2017.8344616

Abstract

To exploit the high accuracy, inherent redundancy, and embarrassingly parallel nature of Convolutional Neural Networks (CNN), for intelligent embedded systems, many dedicated CNN accelerators have been presented. These accelerators are optimized to employ compression, tiling, and layer merging for a specific data flow/parallelism pattern. However, the dimension of a CNN differ widely from one application to another (and also from one layer to another). Therefore, the optimal parallelism and data flow pattern also differs significantly in different CNN layers. An efficient accelerator should have flexibility to not only efficiently support different data flow patterns but also to interleave and cascade them. To achieve this ability requires configuration overheads. This paper analyzes whether the reconfiguration overheads for interleaving and cascading multiple data flow and parallelism patterns are justified. To answer this question, we first design a reconfigurable CNN accelerator, called ReCon. ReCon is the compared with state-of-the-art accelerators. Post-layout synthesis results reveal that ReCon provides up to 2.2X higher throughput and up to 2.3X better energy efficiency at the cost of 26–35% additional area.

Full Text