Abstract

This article addresses the growing need in resource-constrained edge computing scenarios for energy-efficient convolutional neural network (CNN) accelerators on mobile Field-Programmable Gate Array (FPGA) systems. In particular, we concentrate on register transfer level (RTL) design flow optimization to improve programming speed and power efficiency. We present a re-configurable accelerator design optimized for CNN-based object-detection applications, especially suitable for mobile FPGA platforms like the Xilinx PYNQ-Z2. By not only optimizing the MAC module using Enhanced clock gating (ECG), the accelerator can also use low-power techniques such as Local explicit clock gating (LECG) and Local explicit clock enable (LECE) in memory modules to efficiently minimize data access and memory utilization. The evaluation using ResNet-20 trained on the CIFAR-10 dataset demonstrated significant improvements in power efficiency consumption (up to 22%) and performance. The findings highlight the importance of using different optimization techniques across multiple hardware modules to achieve better results in real-world applications.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call