Accelerating On-Device DNN Training Workloads via Runtime Convergence Monitor

Seungkyu Choi,Lee-Sup Kim,Jaekang Shin

doi:10.1109/tcad.2022.3206394

Abstract

With the growing demand for processing deep learning applications on edge devices, on-device DNN training has become a major workload to execute a variety of vision tasks suited for users. Therefore, architectures employed with algorithm co-design to accelerate the training process have been steadily studied. However, previous solutions are mostly supported by extended versions of the inference studies, such as sparsity, data flow, quantization, etc. Moreover, most works examine their schemes on from-the-scratch training that cannot tolerate inaccurate computing. Accordingly, there are still factors that hinder the overall speed of the DNN training process that has not been addressed in practical workloads. In this work, we propose a runtime convergence monitor to achieve massive computational savings in the practical on-device training workloads (i.e., transfer learning-based task adaptation). By monitoring the network output data, we determine the training intensity of incoming tasks and adaptively detect the convergence in iteration intervals for training diverse datasets. Furthermore, we enable computation skip of converged images determined by the monitored prediction probability to enhance the training speed within an iteration. As a result, we perform an accurate but fast convergence in model training for the task adaptation with minimal overhead. Unlike the previous approximation methods, our monitoring system enables runtime optimization and can be easily applicable to any type of accelerator attaining significant speedup. Evaluation results on various datasets show geomean of 2.2× speedup when applied in any systolic architectures and further enhancement of 3.6× when applied in accelerators dedicated for on-device training.

Full Text