Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory.

Donghee Ha,Chi Yoon Jeong,Mooseop Kim,Kyeongdeok Moon

doi:10.3390/s21072364

Donghee Ha, Chi Yoon Jeong + Show 2 more

Open Access

https://doi.org/10.3390/s21072364

Copy DOI

Abstract

Recent studies have applied the superior performance of deep learning to mobile devices, and these studies have enabled the running of the deep learning model on a mobile device with limited computing power. However, there is performance degradation of the deep learning model when it is deployed in mobile devices, due to the different sensors of each device. To solve this issue, it is necessary to train a network model specific to each mobile device. Therefore, herein, we propose an acceleration method for on-device learning to mitigate the device heterogeneity. The proposed method efficiently utilizes unified memory for reducing the latency of data transfer during network model training. In addition, we propose the layer-wise processor selection method to consider the latency generated by the difference in the processor performing the forward propagation step and the backpropagation step in the same layer. The experiments were performed on an ODROID-XU4 with the ResNet-18 model, and the experimental results indicate that the proposed method reduces the latency by at most 28.4% compared to the central processing unit (CPU) and at most 21.8% compared to the graphics processing unit (GPU). Through experiments using various batch sizes to measure the average power consumption, we confirmed that device heterogeneity is alleviated by performing on-device learning using the proposed method.

Highlights

Recent developments in computing hardware (e.g., graphics processing units (GPUs) and tensor processing units (TPUs) [1]) have enabled large scale parallel processing, resulting in a substantial reduction in the inference/training time for deep learning on PC/server platforms
Compared to existing methods for inference, the proposed method is made more suitable for deep learning training by considering its cyclic process; We explore the usability by applying on-device learning to the acoustic scene classification (ASC) field
To see how the batch size affects the deep learning training speed, we investigated the neural network model training time according to the batch size

Summary

Selection Method on Unified Memory

Citation: Ha, D.; Kim, M.; Moon, K.; Abstract: Recent studies have applied the superior performance of deep learning to mobile devices, and these studies have enabled the running of the deep learning model on a mobile device with limited computing power. However, there is performance degradation of the deep learning model when it is deployed in mobile devices, due to the different sensors of each device. To solve this issue, it is necessary to train a network model specific to each mobile device. Therefore, herein, we propose an acceleration method for on-device learning to mitigate the device heterogeneity. The proposed method efficiently utilizes unified memory for reducing the latency of data transfer during network model training. In addition, we propose the layer-wise processor selection method to consider the latency generated by the difference in the processor performing the forward propagation step and the backpropagation step in the same layer. The experiments were performed on an ODROID-XU4 with the ResNet-18 model, and the experimental results indicate that the proposed method reduces the latency by at most 28.4% compared to the central processing unit (CPU) and at most 21.8% compared to the graphics processing unit (GPU). Through experiments using various batch sizes to measure the average power consumption, we confirmed that device heterogeneity is alleviated by performing on-device learning using the proposed method. Selection Method on Unified Memory. Keywords: deep learning acceleration; processor selection algorithm; on-device learning; acoustic scene classification; mobile devices doi.org/10.3390/s21072364 Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Licensee MDPI, Basel, Switzerland.

Introduction

Related Work

Hardware for Accelerating Neural Network

Proposed Method

Data Transfer on Unified Memory

Experimental Setup

Experiments of Device Heterogeneity in ASC

Experiments of Proposed Method

31 SOFTMAX

Batch Size

Average Power Consumption

Deep Learning Inference

Findings

Conclusions and Discussion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sensors (Basel, Switzerland)	Publication Date: Mar 29, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)

Lead the way for us

Similar Papers

Seamless GPU Acceleration for C++-Based Physics with the Metal Shading Language on Apple’s M Series Unified Chips
Lars Gebraad ... Andreas Fichtner
Seismological Research Letters | VOL. -
Lars Gebraad, et. al.Lars Gebraad ... Andreas Fichtner
06 Feb 2023
Seismological Research Letters | VOL. -

Plausible mass-spring system using parallel computing on mobile devices
Min Hong ... Jae-Hong Jeon
Human-centric Computing and Information Sciences | VOL. 6
Min Hong, et. al.Min Hong ... Jae-Hong Jeon
28 Nov 2016
Human-centric Computing and Information Sciences | VOL. 6

Dynamic Heterogeneous scheduling of GPU-CPU in Distributed Environment
Suman Goyat ... Shri Kant
-
Suman Goyat, et. al.Suman Goyat ... Shri Kant
01 Nov 2019
01 Nov 2019

Reduction of computing time for seismic applications based on the Helmholtz equation by Graphics Processing Units

-

03 Mar 2015
03 Mar 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating On-Device Learning with Layer-Wise Processor Selection Method on Unified Memory.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sensors (Basel, Switzerland)