Characterizing the Performance of Accelerated Jetson Edge Devices for Training Deep Learning Models

Prashanthi S.K,Yogesh Simmhan,Sai Anuroop Kesanapalli

doi:10.1145/3606376.3593530

Abstract

Deep Neural Network (DNN) models are becoming ubiquitous in a variety of contemporary domains such as Autonomous Vehicles, Smart cities and Healthcare. They help drones to navigate, identify suspicious activities from safety cameras, and perform diagnostics over medical imaging. Fast DNN inferencing close to the data source is enabled by a growing class of accelerated edge devices such as NVIDIA Jetson and Google Coral which host low-power Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs) along with ARM CPUs in a compact form-factor to offer a superior performance-to-energy ratio. E.g., the NVIDIA Jetson AGX Xavier kit has a 512-core Volta GPU, an 8-core ARM CPU and 32GB LPDDR4x memory, that operates within 65W of power, costs US999 and is smaller than a paperback novel. Recently, there has been a push towards training DNN models on the edge [2]. This is driven by the massive growth in data collected from edge devices in Cyber-Physical Systems (CPS) and Internet of Things (IoT), the need to refresh the models periodically, the bandwidth constraints in moving all this data to Cloud data centers for training, and a heightened emphasis on privacy by retaining data on the edge. This has led to techniques like federated and geo-distributed learning that train DNN models locally on data on an edge device and aggregate them centrally. In this abstract, we summarise and highlight key results from our full paper [5].

Full Text