CitiusSynapse: A Deep Learning Framework for Embedded Systems

Seungtae Hong,Jeong-Si Kim,Hyunwoo Cho

doi:10.3390/app112311570

Seungtae Hong, Jeong-Si Kim + Show 1 more

Open Access

https://doi.org/10.3390/app112311570

Copy DOI

Abstract

As embedded systems, such as smartphones with limited resources, have become increasingly popular, active research has recently been conducted on performing on-device deep learning in such systems. Therefore, in this study, we propose a deep learning framework that is specialized for embedded systems with limited resources, the operation processing structure of which differs from that of standard PCs. The proposed framework supports an OpenCL-based accelerator engine for accelerator deep learning operations in various embedded systems. Moreover, the parallel processing performance of OpenCL is maximized through an OpenCL kernel that is optimized for embedded GPUs, and the structural characteristics of embedded systems, such as unified memory. Furthermore, an on-device optimizer for optimizing the performance in on-device environments, and model converters for compatibility with conventional frameworks, are provided. The results of a performance evaluation show that the proposed on-device framework outperformed conventional methods.

Highlights

Deep neural networks (DNNs) have been widely adopted in various fields, such as in image and character recognition and object detection [1,2,3,4,5,6,7,8,9,10]
The ACL is only operable in ARM central processing unit (CPU) and graphics processing units (GPUs); Caffe or TensorFlow models can be used when ArmNN [34] is used, but ACL alone cannot be linked with conventional deep learning frameworks
The accelerator engine consists of OpenCL-based BLAS (CSblas), which is optimized for embedded GPUs, and a DNN-accelerated library

Summary

Introduction

Deep neural networks (DNNs) have been widely adopted in various fields, such as in image and character recognition and object detection [1,2,3,4,5,6,7,8,9,10]. In this study, we propose CitiusSynapse as a deep learning framework that is specialized for embedded systems. The proposed framework performs deep learning operations which are based on OpenCL [21] to accelerate deep learning operations. Sci. 2021, 11, 11570 learning operations which are based on OpenCL [21] to accelerate deep learning operawithin various embedded systems. Structural characteristics ofas embedded systems, such as sharedCPUs unified memory our framework provides on-device inference performance optimizer forperformance embedded. Our framework was compared with conventional by deep performance evaluation in an embedded board, equipped with an. The deep learning core executes deep conjunction with the accelerator engine. Section shows the superiority of the proposed framework when compared the conventional deep learning through athrough performance framework whentocompared to the conventional deepframework learning framework a perevaluation

OpenCL

Accelerated Libraries for Deep Learning

Deep Learning Frameworks

Deep Learning Core and Accelerator Engine

Data Structure with Unified Memory shown in inFigure

Comparison

Section 3.2.2.

Structure

Accelerator Engine

CSblas

Asynchronous queue execution

On-Device Optimizer for Inference

NDRange Optimizer

Quantization Optimizer

Model Converter for Compatibility

Experimental Setup

64 GB eMMC

Comparison of Inference Times

Comparison of Memory Usage for Inference

Findings

Conclusions and Discussion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: Dec 6, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

CitiusSynapse: A Deep Learning Framework for Embedded Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

A Review of Deep Transfer Learning and Recent Advancements
Mohammadreza Iman ... Khaled Rasheed
Technologies | VOL. 11
Mohammadreza Iman, et. al.Mohammadreza Iman ... Khaled Rasheed
14 Mar 2023
Technologies | VOL. 11

Efficient Use of GPU Memory for Large-Scale Deep Learning Model Training
Hyeonseong Choi ... Jaehwan Lee
Applied sciences | VOL. 11
Hyeonseong Choi, et. al.Hyeonseong Choi ... Jaehwan Lee
04 Nov 2021
Applied sciences | VOL. 11

Deep Learning for Human Activity Recognition using on-Node Sensors
... Devavarapu Sreenivasarao
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 8
, et. al. ... Devavarapu Sreenivasarao
30 Jan 2020
International Journal of Recent Technology and Engineering (IJRTE) | VOL. 8

Semi-Supervised Deep Adversarial Learning for Brain-Computer Interface
Wonjun Ko ... Heung-Il Suk
-
Wonjun Ko, et. al.Wonjun Ko ... Heung-Il Suk
01 Feb 2019
01 Feb 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CitiusSynapse: A Deep Learning Framework for Embedded Systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences