Abstract

In the implementation process of a convolution neural network (CNN)-based object detection system, the primary issues are power dissipation and limited throughput. Even though we utilize ultra-low power dissipation devices, the dynamic power dissipation issue will be difficult to resolve. During the operation of the CNN algorithm, there are several factors such as the heating problem generated from the massive computational complexity, the bottleneck generated in data transformation and by the limited bandwidth, and the power dissipation generated from redundant data access. This article proposes the low-power techniques, applies them to the CNN accelerator on the FPGA and ASIC design flow, and evaluates them on the Xilinx ZCU-102 FPGA SoC hardware platform and 45 nm technology for ASIC, respectively. Our proposed low-power techniques are applied at the register-transfer-level (RT-level), targeting FPGA and ASIC. In this article, we achieve up to a 53.21% power reduction in the ASIC implementation and saved 32.72% of the dynamic power dissipation in the FPGA implementation. This shows that our RTL low-power schemes have a powerful possibility of dynamic power reduction when applied to the FPGA design flow and ASIC design flow for the implementation of the CNN-based object detection system.

Highlights

  • Among the machine learning algorithms, the convolutional neural network (CNN) model is one of the popular architectures and keywords at present

  • With the growing usage of CNN-based Internet of Thing (IoT) products, including autonomous vehicles, companies are developing and releasing various sizes of customized chips to support the massive amount of CNN computational processes, such as the Tensor Processing Unit (TPU), deep learning processing unit (DPU), holographic processing unit (HPU), image processing unit (IPU), neural network processing unit (NPU), and vision processing unit (VPU) [13]

  • We demonstrated our proposed technology through the CNN accelerator, which consumed the most power in the CNN architecture

Read more

Summary

Introduction

Among the machine learning algorithms, the convolutional neural network (CNN) model is one of the popular architectures and keywords at present. Electronics 2020, 9, 478 which can process the larger amount of data simultaneously or in parallel This increased hardware capability causes a large amount of power consumption inevitably. With the growing usage of CNN-based Internet of Thing (IoT) products, including autonomous vehicles, companies are developing and releasing various sizes of customized chips to support the massive amount of CNN computational processes, such as the Tensor Processing Unit (TPU), deep learning processing unit (DPU), holographic processing unit (HPU), image processing unit (IPU), neural network processing unit (NPU), and vision processing unit (VPU) [13]. For the algorithm model aspect, to achieve high-performance and high-throughput results, most researchers and developers have suggested novel CNN architectures and efficient memory structures to decrease the processing time and improve the parallel computing performance so that they can increase the power efficiency [15,16,17,18]. We evaluate the power consumption through a reliable experimental environment including an FPGA platform and ASIC design flow

Background
Clock Gating
Local Explicit Clock Enable
Local Explicit Clock Gating
Bus-Specific Clock Gating
Enhanced Clock Gating
Memory Split
Proposed CNN Accelerator
Practical Application of the Industrial CNN Accelerator
Experiment Results
Testing Environment
FPGA Implementation Result
ASIC Implementation Result
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call