Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices

Svetlana Minakova,Todor Stefanov

doi:10.1109/dsd51259.2020.00031

Abstract

Nowadays, convolutional neural networks (CNNs) are the core of many intelligent systems, including those that run on mobile and embedded devices. However, the execution of computationally demanding and memory-hungry CNNs on resource-limited mobile and embedded devices is quite challenging. One of the main problems, when running CNNs on such devices, is the limited amount of memory available. Thus, reduction of the CNN memory footprint is crucial for the CNN inference on mobile and embedded devices. The CNN memory footprint is determined by the amount of memory required to store CNN parameters (weights and biases) and intermediate data, exchanged between CNN operators. The most common approaches, utilized to reduce the CNN memory footprint, such as pruning and quantization, reduce the memory required to store the CNN parameters. However, these approaches decrease the CNN accuracy. Moreover, with the increasing depth of the state-of-the-art CNNs, the intermediate data exchanged between CNN operators takes even more space than the CNN parameters. Therefore, in this paper, we propose a novel approach, which allows to reduce the memory, required to store intermediate data, exchanged between CNN operators. Unlike pruning and quantization approaches, our proposed approach preserves the CNN accuracy and reduces the CNN memory footprint at the cost of decreasing the CNN throughput. Rus, our approach is orthogonal to the pruning and quantization approaches, and can be combined with these approaches for further CNN memory footprint reduction.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Aug 1, 2020
Citations: 13	License type: other-oa

Similar Papers

An Efficient Task Assignment Framework to Accelerate DPU-Based Convolutional Neural Network Inference on FPGAs
Jiang Zhu ... Jianqi Li
IEEE Access | VOL. 8
Jiang Zhu, et. al.Jiang Zhu ... Jianqi Li
01 Jan 2020
IEEE Access | VOL. 8

Incremental and Approximate Computations for Accelerating Deep CNN Inference
Supun Nakandala ... Arun Kumar
ACM Transactions on Database Systems | VOL. 45
Supun Nakandala, et. al.Supun Nakandala ... Arun Kumar
06 Dec 2020
ACM Transactions on Database Systems | VOL. 45

PICO: Pipeline Inference Framework for Versatile CNNs on Diverse Mobile Devices
Xiang Yang ... Haifeng Sun
IEEE Transactions on Mobile Computing | VOL. 23
Xiang Yang, et. al.Xiang Yang ... Haifeng Sun
01 Apr 2024
IEEE Transactions on Mobile Computing | VOL. 23

An IGWOCNN Deep Method for Medical Education Quality Estimating
Lin Shi ... Lei Zheng
Mathematical Problems in Engineering | VOL. 2022
Lin Shi, et. al.Lin Shi ... Lei Zheng
09 Aug 2022
Mathematical Problems in Engineering | VOL. 2022

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Buffer Sizes Reduction for Memory-efficient CNN Inference on Mobile and Embedded Devices

Abstract

Talk to us

Similar Papers