Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network Training

Jijun Wang,Hongliang Li

doi:10.1007/978-981-15-2767-8_42

Abstract

When training convolutional neural networks, a large amount of operations and memory access are in need, which easily lead to the bottleneck of “memory wall” and decrease the computational performance and efficiency. Batch Normalization (BN) can effectively speed up the deep network training convergence, but it has complex data dependence and causes more serious “memory wall” bottleneck. Aiming at the “memory wall” problem occurred in the training for convolutional neural network using BN algorithm, the training method with splitting BN layer and multi-layer fusion calculation is proposed to reduce the memory access in model training. Firstly, by reordering “CONV+BN+RELU” (CBR) block, we trade computation for memory access with extra computation to reduce data accessed during training. Secondly, according to the memory access characteristics of the BN layer, the BN layer is divided into two sub-layers, which are respectively fused with the adjacent layers and the CBR block is recombined into “BN_B+RELU+CONV+BN_A” (BRCB), which further reduces the read-write of the main memory during training and alleviates the “memory wall” bottleneck to improve accelerator computational efficiency. The experimental results show that when using the NVIDIA TESLA V100 GPU to train ResNet-50, Inception V3 and DenseNet models, compared with the original training method, the amount of data accessed using BRCB multi-layer fusion optimization method is reduced by 33%, 22% and 31% respectively, and the actual computing efficiency of V100 is improved by 19%, 18% and 21% respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network Training

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

An Empirical Study on Position of the Batch Normalization Layer in Convolutional Neural Networks
Moein Hasani ... Hassan Khotanlou
-
Moein Hasani, et. al.Moein Hasani ... Hassan Khotanlou
01 Dec 2019
01 Dec 2019

PRBN: A Pipelined Implementation of RBN for CNN Training
Zhijie Yang ... Li Luo
-
Zhijie Yang, et. al.Zhijie Yang ... Li Luo
01 Jan 2020
01 Jan 2020

Bactran: A Hardware Batch Normalization Implementation for CNN Training Engine
Yang Zhijie ... Li Shiming
IEEE Embedded Systems Letters | VOL. 13
Yang Zhijie, et. al.Yang Zhijie ... Li Shiming
28 Feb 2020
IEEE Embedded Systems Letters | VOL. 13

Batch Normalization Processor Design for Convolution Neural Network Training and Inference
Yu-Sheng Ting ... Yu-Fan Teng
-
Yu-Sheng Ting, et. al.Yu-Sheng Ting ... Yu-Fan Teng
01 May 2021
01 May 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Minimizing Off-Chip Memory Access for Deep Convolutional Neural Network Training

Abstract

Talk to us

Similar Papers