Perf4sight: A toolflow to model CNN training performance on Edge GPUs

Aditya Rajagopal,Christos-Savvas Bouganis

doi:10.1109/iccvw54120.2021.00112

Abstract

The increased memory and processing capabilities of today’s edge devices create opportunities for greater edge intelligence. In the domain of vision, the ability to adapt a Convolutional Neural Network’s (CNN) structure and parameters to the input data distribution leads to systems with lower memory footprint, latency and power consumption. However, due to the limited compute resources and memory budget on edge devices, it is necessary for the system to be able to predict the latency and memory footprint of the training process in order to identify favourable training configurations of the network topology and device combination for efficient network adaptation. This work proposes perf4sight, an automated methodology for developing accurate models that predict CNN training memory footprint and latency given a target device and network. This enables rapid identification of network topologies that can be retrained on the edge device with low resource consumption. With PyTorch as the framework and NVIDIA Jetson TX2 as the target device, the developed models predict training memory footprint and latency with 95% and 91% accuracy respectively for a wide range of networks, opening the path towards efficient network adaptation on edge GPUs.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Perf4sight: A toolflow to model CNN training performance on Edge GPUs

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

MLino bench: A comprehensive benchmarking tool for evaluating ML models on edge devices
Vlad-Eusebiu Baciu ... Bruno Da Silva
Journal of Systems Architecture | VOL. 155
Vlad-Eusebiu Baciu, et. al.Vlad-Eusebiu Baciu ... Bruno Da Silva
10 Aug 2024
Journal of Systems Architecture | VOL. 155

Memory-Efficient AI Algorithm for Infant Sleeping Death Syndrome Detection in Smart Buildings
Qian Huang ... Jiaen Hsieh
AI | VOL. 2
Qian Huang, et. al.Qian Huang ... Jiaen Hsieh
08 Dec 2021
AI | VOL. 2

Performance Improvements in Quantization Aware Training and Appreciation of Low Precision Computation in Deep Learning
Uday Kulkarni ... Kunal Jadhav
-
Uday Kulkarni, et. al.Uday Kulkarni ... Kunal Jadhav
01 Jan 2020
01 Jan 2020

Memory optimization at Edge for Distributed Convolution Neural Network
Soumyalatha Naveen ... Manjunath R Kounte
Transactions on Emerging Telecommunications Technologies | VOL. 33
Soumyalatha Naveen, et. al.Soumyalatha Naveen ... Manjunath R Kounte
15 Sep 2022
Transactions on Emerging Telecommunications Technologies | VOL. 33

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Perf4sight: A toolflow to model CNN training performance on Edge GPUs

Abstract

Talk to us

Similar Papers