Abstract
SummaryRecent breakthroughs in the deep convolutional neural networks (CNNs) have led to great improvements in the accuracy of both vision and auditory systems. Characterized by their deep structures and large numbers of parameters, deep CNNs challenge the computational performance of today. Hardware specialization in the form of field‐programmable gate array offers a promising path towards major leaps in computational performance while achieving high‐energy efficiency.In this paper, we focus on accelerating deep CNNs using the Xilinx Zynq‐zq7045 FPGA SoC. As most of the computational workload can be converted to matrix multiplications, we adopt a matrix multiplier‐based accelerator architecture. Dedicated units are designed to eliminate the conversion overhead. We also design a customized memory system according to the memory access pattern of CNNs. To make the accelerator easily usable by application developers, our accelerator supports Caffe, which is a widely used software framework of deep CNN. Different CNN models can be adopted by our accelerator, with good performance portability. The experimental results show that for a typical application of CNN, image classification, an average throughout of 77.8 GFLOPS is achieved, while the energy efficiency is 4.7× better than an Nvidia K20 GPGPU. © 2016 The Authors. Concurrency and Computation: Practice and Experience Published by John Wiley & Sons Ltd
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Concurrency and Computation: Practice and Experience
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.