ODMDEF: On-Device Multi-DNN Execution Framework Utilizing Adaptive Layer-Allocation on General Purpose Cores and Accelerators

Cheolsun Lim,Myungsun Kim

doi:10.1109/access.2021.3088861

Cheolsun Lim, Myungsun Kim

Open Access

https://doi.org/10.1109/access.2021.3088861

Copy DOI

Abstract

On-device DNN processing has been common interests in the field of autonomous driving research. For better accuracy, both the number of DNN models and the model-complexity have been increased. To properly respond to this, hardware platforms structured with multicore-based CPUs and DNN accelerators have been released, and the GPU is generally used as an accelerator. When multiple DNN workloads are sporadically requested, the GPU can be easily oversubscribed, thereby leading to an unexpected performance bottleneck. We propose an on-device CPU-GPU co-scheduling framework for multi-DNN execution to remove the performance barrier precluding DNN executions from being bounded by the GPU. Our framework fills up the unused CPU cycles with DNN computations to ease the computational burden of the GPU. To provide seamless computing environment for the two different core types, the framework formats each layer execution according to the computational methods supported by CPU and GPU cores. To cope with irregular arrivals of DNN workloads, and to accommodate their fluctuating demands for hardware resources, our framework dynamically selects the best fit core type after making a comparative judgement between the current availabilities of the two core types. During the core selection time, offline-trained prediction models are utilized to get precisely predicted execution time of the issued layer. Our framework mitigates the fact that even the same DNN models can have large performance deviations due to the nature of the process scheduler of the underlying OS which is GPU-agnostic. In addition, the framework minimizes the memory copy overhead inevitably occurring in the data synchronization phase between the heterogeneous cores. To do so, we further analyze GPU-to-CPU and CPU-to-GPU data transfer cases separately, and then apply the solution that best suits each case. For multi-DNN inference jobs with the NVIDIA Jetson AGX Xavier platform, our framework speeds up the execution time by up to 46.6% over the GPU-only solution.

Highlights

N OWADAYS deep neural networks (DNNs) have made great success for the use in smart robotics [1], autonomous driving [2], [3], smart farming [4] and advertisement [5], [6]
We present ODMDEF, a framework that provides seamless cooperative scheduling of heterogeneous cores while ensuring improved total end-to-end latency performance of requested multiple DNN models running concurrently on an embedded system
We evaluate ODMDEF on the NVIDIA Jetson AGX Xavier Developer Kit [7] to demonstrate the advantages of the proposed scheduling framework with multiple DNN models

Summary

Introduction

N OWADAYS deep neural networks (DNNs) have made great success for the use in smart robotics [1], autonomous driving [2], [3], smart farming [4] and advertisement [5], [6]. In object detection and analyzing images, DNN models have shown dominant performance advantages so DNN technologies are gaining much interest in autonomous driving applications. Such applications usually have multiple sensor-driven procedures that take raw sensor data and request multiple DNN processing jobs to various DNN models which are pre-trained by GPU server systems. The number of DNN models used in autonomous driving are increasing, and to improve the prediction accuracy, more highly sophisticated models are utilized. In line with this trend, studies on embedded computing with multiple DNN. Since we center around developing a general approach which can be adopted to the off-the-shelf hardware platforms, we choose the NVIDIA Jetson AGX Xavier Developer Kit [7] rather than other custom domain-specific processors

Objectives

Results

Conclusion