Batch Normalization

Yan Wang,Xiaofu Wu,Suofei Zhang,Quan Zhou,Jun Yan,Yuanyuan Chang

doi:10.1145/3195106.3195150

Batch Normalization

Yan Wang, Xiaofu Wu + Show 4 more

https://doi.org/10.1145/3195106.3195150

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The state-of-the-art training of deep neural networks requires to normalize the activities of the neurons for accelerating the training process. A standard approach is to employ batch normalization (BN), in which the activations are normalized by the mean and standard deviation of the training mini-batch. To be invertible, BN also introduces an adaptive gain and bias which are applied after the normalization but often before the non-linearity. In this paper, we investigate the effects of learnable parameters, gain and bias, on the training of various typical deep neural nets, including ALL-CNNs, Network In Network (NIN), ResNets. Through extensive experiments, we show that there is no big difference in both training convergence and final test accuracy if we remove the BN layer following the final convolutional layer from a convolutional neural network (CNN) for standard classification tasks. We also observed that without adaptively updating learnable parameters for BN layers, it often requires less time for training of very deep neural nets such as ResNet-101.

Full Text