Abstract

Convolutional Neural Networks (CNNs) have a wide range of applications due to their superior performance in image and pattern classification. However, the performance of CNNs comes at the price of high computational load and memory bandwidth usage. Hardware acceleration has become the primary way to tackle this ever-increasing complexity of CNNs. Most of the recent accelerators arrange processing units (PEs) as a many-core accelerator architecture, with the inter-PE connections tailored to the specific dataflow of the CNN layers. The performance of such accelerators is maximized if the input feature map and filter size/dimension matches that of the underlying accelerator. However, current fixed-size accelerator structures lead to sever resource underutilization because the same structure is used to compute CNN layers of varying dimensions. In this paper, we tackle this problem by presenting RC-CNN, a reconfigurable accelerator architecture for CNNs that can adapt the structure of accelerator to the size and dataflow pattern of the running CNN layer. RC-CNN relies on a reconfigurable on-chip interconnection fabric that can organize a sub-set of accelerator’s PEs as a PE set with the same size/dimension of the target CNN layer and customize the inter-PE connections for the layer’s dataflow pattern. Since the area/energy overhead does not justify using a full-fledged packet-switched network in accelerators with fine-grained PEs, we use a reconfigurable network with very simple switches in order to efficiently implement the dynamic reconfiguration capability for many-core fine-grained CNN accelerators. Experimental results show that, based on the CNN size and accelerator structure, RC-CNN yields 37% higher PE utilization over a baseline design, on average. It also improves the PE utilization of the state-of-the-art CNN accelerators we selected for comparison purpose by 18%, on average. The results show that these improvements translate to 9%–41% increase in the accelerator’s throughput. Further, RC-CNN reduces the network latency and energy consumption by 28% and 22%, respectively, compared to the state-of-the-art utilization-aware methods that employ packet-switched networks-on-chip.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.