Blackthorn: Latency Estimation Framework for CNNs on Embedded Nvidia Platforms

Martin Lechner,Axel Jantsch

doi:10.1109/access.2021.3101936

Abstract

With more powerful yet efficient embedded devices and accelerators being available for Deep Neural Networks (DNN), machine learning is becoming an integral part of edge computing. As the number of such devices increases, finding the best platform for a specific application has become more challenging. A common question for application developers is to find the most cost-effective combination of a DNN and a device while still meeting latency and accuracy requirements. In this work, we propose Blackthorn, a layer-wise latency estimation framework for embedded Nvidia GPUs based on analytical models. We provide accurate predictions for each layer, helping developers to find bottlenecks and optimize the architecture of a DNN to fit target platforms. Our framework can quickly evaluate and compare large amounts of network optimizations without needing to build time-consuming execution engines. Our experimental results on Jetson TX2 and Jetson Nano devices show a per-layer estimation error of 6.104% Root-Mean-Square-Percentage-Error (RMSPE) and 5.888% RMSPE, which significantly outperforms current state-of-the-art methods. At network level, the average latency error is below 3% for the tested DNNs.

Highlights

Deep Neural networks (DNNs) are widely adopted as key components in many use-cases like vision and speech processing solutions
Computer vision and machine learning engineers are often interested in quick estimations like: Can their neural network run on a specific hardware platform with a given latency? What effect does a change of parameters due to optimization or a larger input image have on latency? In order to meet latency requirements on resource-limited embedded platforms, compression techniques like quantization [6], pruning [7], and shunt connections [8] are utilized
To overcome the limitations mentioned above and fill the gap of embedded Graphic Processing Units (GPUs) platforms, we propose Blackthorn, a layer-wise latency estimation framework for Convolutional Neural Networks (CNNs) on embedded Nvidia GPUs

Summary

INTRODUCTION

Deep Neural networks (DNNs) are widely adopted as key components in many use-cases like vision and speech processing solutions. Analyzing and comparing different setups, e.g., different optimization and compression techniques applied to multiple scales of several DNN architectures, is usually extremely time-consuming It often results in retraining the network, and most platforms require a building or compiling step before execution to achieve optimal performance, increasing the time to test a single network. To skip the time consuming compiling step, DNN latency prediction techniques based on analytical or statistical models have been put forward They target either large desktop-grade GPUs [10], [11] embedded Central Processing Units (CPUs) [12] but not more powerful embedded devices. Blackthorn, a model-based framework to estimate the execution time of convolutional neural networks on embedded Nvidia platforms; An estimation method based on an analytical approach using a combination of linear and step functions; Fast platform benchmarking by finding optimized measurement points and minimizing their number.

PROFILING TOOL

FUNCTION FITTER

MODEL BUILDER

50 Groundtruth Model

ESTIMATION TOOL

RESULTS

CONCLUSION AND FUTURE WORK

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Blackthorn: Latency Estimation Framework for CNNs on Embedded Nvidia Platforms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

Artificial intelligence in interdisciplinary life science and drug discovery research.
Jürgen Bajorath
Future science OA | VOL. 8
Jürgen BajorathJürgen Bajorath
08 Mar 2022
Future science OA | VOL. 8

Application of machine learning for filtered density function closure in MILD combustion
Zhi X Chen ... Giuseppe D’Alessio
Combustion and Flame | VOL. 225
Zhi X Chen, et. al.Zhi X Chen ... Giuseppe D’Alessio
11 Nov 2020
Combustion and Flame | VOL. 225

Autoencoder and restricted Boltzmann machine for transfer learning in functional magnetic resonance imaging task classification
Jundong Hwang ... Jong-Hwan Lee
Heliyon | VOL. 9
Jundong Hwang, et. al.Jundong Hwang ... Jong-Hwan Lee
01 Jul 2023
Heliyon | VOL. 9

Comprehensive Study for Breast Cancer Using Deep Learning and Traditional Machine Learning
-
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34
--
12 Apr 2022
ZANCO JOURNAL OF PURE AND APPLIED SCIENCES | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Blackthorn: Latency Estimation Framework for CNNs on Embedded Nvidia Platforms

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access