CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge

Yeongmin Kim,Arslan Munir,Joonho Kong

doi:10.1109/access.2020.3039278

Yeongmin Kim, Arslan Munir + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3039278

Copy DOI

Abstract

Convolutional neural networks (CNNs) are widely deployed for many artificial intelligence (AI) applications, such as object detection and image classification. Due to the burgeoning revolution in edge AI, CNN hardware accelerators are also being employed in resource-constrained edge devices for achieving better performance and energy efficiency at the edge. Although CNN accelerators enable fast and energy-efficient CNN inference at the edge, the remaining hardware resources on the edge devices except for the CNN accelerator remain idle, which could otherwise be utilized for attaining even better performance and energy efficiency for CNN inferences. In this paper, we propose a CPU-accelerator co-scheduling technique for convolution (CONV) layer operations of CNN inferences in resource-constrained edge devices. Our proposed co-scheduling technique exploits an inherent parallelism in CNN output channels, that is, the operations for generating different output channels in a CONV layer can be executed in parallel. For load balancing between the CPU and the CNN accelerator, we also propose a simple, yet accurate latency model for CONV layer operations in the CPU and the accelerator. Based on the latency estimation of CONV layer operations provided by our proposed model, we distribute the tasks to the CPU and the CNN accelerator in a load-balance manner to minimize the idle period during the CONV layer operations in both the CPU and the CNN accelerator. We implement our proposed hardware/software (HW/SW) co-scheduling technique in various field-programmable gate array system-on-chip (FPGA-SoC) platforms as a proof-of-concept. Experimental results indicate that our proposed co-scheduling technique improves system performance by $1.18\times -2.00\times $ with energy reduction of 14.9% – 49.7% as compared to the accelerator-only execution.

Highlights

Recent advancements in artificial intelligence (AI), in particular convolutional neural networks (CNNs) that provide object detection and image classification, have revolutionized a number of real-life applications, such as transportation, agriculture, industrial automation, and home monitoring systems
EXPERIMENTAL RESULTS we present the experimental results for our proposed co-scheduling technique related to various metrics, such as latency model accuracy (measured via mean absolute percentage error (MAPE)), performance, and energy consumption
We have proposed a CPUaccelerator co-scheduling technique to accelerate a single CONV layer operation during the CNN inference at the edge

Summary

Introduction

Recent advancements in artificial intelligence (AI), in particular convolutional neural networks (CNNs) that provide object detection and image classification, have revolutionized a number of real-life applications, such as transportation, agriculture, industrial automation, and home monitoring systems. With proliferation of Internet of things (IoT) devices, billions of IoT devices are connected to the Internet generating zettabytes of data at the network edge. Big data, such as online shopping records, social media contents, airlines flight data, and other businessrelated data, has been stored and analyzed at cloud data centers. Though some CNN architectures contain specialized layers (e.g., Inception layer in [7]), most of the modern CNN architectures are composed of convolutional (CONV) layers, pooling layers, and fully connected layers. The activation is performed, typically by rectified linear units (ReLUs), which generate the output feature maps (OFMs) of dimension OH×OW×OC

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 30	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Arithmetic Coding-Based 5-Bit Weight Encoding and Hardware Decoder for CNN Inference in Edge Devices
Jong Hun Lee ... Arslan Munir
IEEE Access | VOL. 9
Jong Hun Lee, et. al.Jong Hun Lee ... Arslan Munir
01 Jan 2020
IEEE Access | VOL. 9

An Uninterrupted Processing Technique-Based High-Throughput and Energy-Efficient Hardware Accelerator for Convolutional Neural Networks
Md Najrul Islam ... Rahul Shrestha
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30
Md Najrul Islam, et. al.Md Najrul Islam ... Rahul Shrestha
01 Dec 2022
IEEE Transactions on Very Large Scale Integration (VLSI) Systems | VOL. 30

An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs
Jiang Zhu ... Haolin Liu
IEEE Access | VOL. 8
Jiang Zhu, et. al.Jiang Zhu ... Haolin Liu
01 Jan 2020
IEEE Access | VOL. 8

Escher: A CNN Accelerator with Flexible Buffering to Minimize Off-Chip Transfer
Yongming Shen ... Peter Milder
-
Yongming Shen, et. al.Yongming Shen ... Peter Milder
01 Apr 2017
01 Apr 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CPU-Accelerator Co-Scheduling for CNN Acceleration at the Edge

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access