Abstract
With more powerful yet efficient embedded devices and accelerators being available for Deep Neural Networks (DNN), machine learning is becoming an integral part of edge computing. As the number of such devices increases, finding the best platform for a specific application has become more challenging. A common question for application developers is to find the most cost-effective combination of a DNN and a device while still meeting latency and accuracy requirements. In this work, we propose Blackthorn, a layer-wise latency estimation framework for embedded Nvidia GPUs based on analytical models. We provide accurate predictions for each layer, helping developers to find bottlenecks and optimize the architecture of a DNN to fit target platforms. Our framework can quickly evaluate and compare large amounts of network optimizations without needing to build time-consuming execution engines. Our experimental results on Jetson TX2 and Jetson Nano devices show a per-layer estimation error of 6.104% Root-Mean-Square-Percentage-Error (RMSPE) and 5.888% RMSPE, which significantly outperforms current state-of-the-art methods. At network level, the average latency error is below 3% for the tested DNNs.
Highlights
Deep Neural networks (DNNs) are widely adopted as key components in many use-cases like vision and speech processing solutions
Computer vision and machine learning engineers are often interested in quick estimations like: Can their neural network run on a specific hardware platform with a given latency? What effect does a change of parameters due to optimization or a larger input image have on latency? In order to meet latency requirements on resource-limited embedded platforms, compression techniques like quantization [6], pruning [7], and shunt connections [8] are utilized
To overcome the limitations mentioned above and fill the gap of embedded Graphic Processing Units (GPUs) platforms, we propose Blackthorn, a layer-wise latency estimation framework for Convolutional Neural Networks (CNNs) on embedded Nvidia GPUs
Summary
Deep Neural networks (DNNs) are widely adopted as key components in many use-cases like vision and speech processing solutions. Analyzing and comparing different setups, e.g., different optimization and compression techniques applied to multiple scales of several DNN architectures, is usually extremely time-consuming It often results in retraining the network, and most platforms require a building or compiling step before execution to achieve optimal performance, increasing the time to test a single network. To skip the time consuming compiling step, DNN latency prediction techniques based on analytical or statistical models have been put forward They target either large desktop-grade GPUs [10], [11] embedded Central Processing Units (CPUs) [12] but not more powerful embedded devices. Blackthorn, a model-based framework to estimate the execution time of convolutional neural networks on embedded Nvidia platforms; An estimation method based on an analytical approach using a combination of linear and step functions; Fast platform benchmarking by finding optimized measurement points and minimizing their number.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.