Abstract
The state-of-the-art training of deep neural networks requires to normalize the activities of the neurons for accelerating the training process. A standard approach is to employ batch normalization (BN), in which the activations are normalized by the mean and standard deviation of the training mini-batch. To be invertible, BN also introduces an adaptive gain and bias which are applied after the normalization but often before the non-linearity. In this paper, we investigate the effects of learnable parameters, gain and bias, on the training of various typical deep neural nets, including ALL-CNNs, Network In Network (NIN), ResNets. Through extensive experiments, we show that there is no big difference in both training convergence and final test accuracy if we remove the BN layer following the final convolutional layer from a convolutional neural network (CNN) for standard classification tasks. We also observed that without adaptively updating learnable parameters for BN layers, it often requires less time for training of very deep neural nets such as ResNet-101.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have