Abstract

Convolutional neural networks (CNNs) are widely deployed for many artificial intelligence (AI) applications, such as object detection and image classification. Due to the burgeoning revolution in edge AI, CNN hardware accelerators are also being employed in resource-constrained edge devices for achieving better performance and energy efficiency at the edge. Although CNN accelerators enable fast and energy-efficient CNN inference at the edge, the remaining hardware resources on the edge devices except for the CNN accelerator remain idle, which could otherwise be utilized for attaining even better performance and energy efficiency for CNN inferences. In this paper, we propose a CPU-accelerator co-scheduling technique for convolution (CONV) layer operations of CNN inferences in resource-constrained edge devices. Our proposed co-scheduling technique exploits an inherent parallelism in CNN output channels, that is, the operations for generating different output channels in a CONV layer can be executed in parallel. For load balancing between the CPU and the CNN accelerator, we also propose a simple, yet accurate latency model for CONV layer operations in the CPU and the accelerator. Based on the latency estimation of CONV layer operations provided by our proposed model, we distribute the tasks to the CPU and the CNN accelerator in a load-balance manner to minimize the idle period during the CONV layer operations in both the CPU and the CNN accelerator. We implement our proposed hardware/software (HW/SW) co-scheduling technique in various field-programmable gate array system-on-chip (FPGA-SoC) platforms as a proof-of-concept. Experimental results indicate that our proposed co-scheduling technique improves system performance by $1.18\times -2.00\times $ with energy reduction of 14.9% – 49.7% as compared to the accelerator-only execution.

Highlights

  • Recent advancements in artificial intelligence (AI), in particular convolutional neural networks (CNNs) that provide object detection and image classification, have revolutionized a number of real-life applications, such as transportation, agriculture, industrial automation, and home monitoring systems

  • EXPERIMENTAL RESULTS we present the experimental results for our proposed co-scheduling technique related to various metrics, such as latency model accuracy (measured via mean absolute percentage error (MAPE)), performance, and energy consumption

  • We have proposed a CPUaccelerator co-scheduling technique to accelerate a single CONV layer operation during the CNN inference at the edge

Read more

Summary

Introduction

Recent advancements in artificial intelligence (AI), in particular convolutional neural networks (CNNs) that provide object detection and image classification, have revolutionized a number of real-life applications, such as transportation, agriculture, industrial automation, and home monitoring systems. With proliferation of Internet of things (IoT) devices, billions of IoT devices are connected to the Internet generating zettabytes of data at the network edge. Big data, such as online shopping records, social media contents, airlines flight data, and other businessrelated data, has been stored and analyzed at cloud data centers. Though some CNN architectures contain specialized layers (e.g., Inception layer in [7]), most of the modern CNN architectures are composed of convolutional (CONV) layers, pooling layers, and fully connected layers. The activation is performed, typically by rectified linear units (ReLUs), which generate the output feature maps (OFMs) of dimension OH×OW×OC

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call