Abstract
As embedded systems, such as smartphones with limited resources, have become increasingly popular, active research has recently been conducted on performing on-device deep learning in such systems. Therefore, in this study, we propose a deep learning framework that is specialized for embedded systems with limited resources, the operation processing structure of which differs from that of standard PCs. The proposed framework supports an OpenCL-based accelerator engine for accelerator deep learning operations in various embedded systems. Moreover, the parallel processing performance of OpenCL is maximized through an OpenCL kernel that is optimized for embedded GPUs, and the structural characteristics of embedded systems, such as unified memory. Furthermore, an on-device optimizer for optimizing the performance in on-device environments, and model converters for compatibility with conventional frameworks, are provided. The results of a performance evaluation show that the proposed on-device framework outperformed conventional methods.
Highlights
Deep neural networks (DNNs) have been widely adopted in various fields, such as in image and character recognition and object detection [1,2,3,4,5,6,7,8,9,10]
The ACL is only operable in ARM central processing unit (CPU) and graphics processing units (GPUs); Caffe or TensorFlow models can be used when ArmNN [34] is used, but ACL alone cannot be linked with conventional deep learning frameworks
The accelerator engine consists of OpenCL-based BLAS (CSblas), which is optimized for embedded GPUs, and a DNN-accelerated library
Summary
Deep neural networks (DNNs) have been widely adopted in various fields, such as in image and character recognition and object detection [1,2,3,4,5,6,7,8,9,10]. In this study, we propose CitiusSynapse as a deep learning framework that is specialized for embedded systems. The proposed framework performs deep learning operations which are based on OpenCL [21] to accelerate deep learning operations. Sci. 2021, 11, 11570 learning operations which are based on OpenCL [21] to accelerate deep learning operawithin various embedded systems. Structural characteristics ofas embedded systems, such as sharedCPUs unified memory our framework provides on-device inference performance optimizer forperformance embedded. Our framework was compared with conventional by deep performance evaluation in an embedded board, equipped with an. The deep learning core executes deep conjunction with the accelerator engine. Section shows the superiority of the proposed framework when compared the conventional deep learning through athrough performance framework whentocompared to the conventional deepframework learning framework a perevaluation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have