Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

Bum Jun Kim,Hyeyeon Choi,Sang Woo Kim,Hyeonah Jang

doi:10.1145/3643860

Abstract

L 2 regularization for weights in neural networks is widely used as a standard training trick. In addition to weights, the use of batch normalization involves an additional trainable parameter γ, which acts as a scaling factor. However, L 2 regularization for γ remains an undiscussed mystery and is applied in different ways depending on the library and practitioner. In this article, we study whether L 2 regularization for γ is valid. To explore this issue, we consider two approaches: (1) variance control to make the residual network behave like an identity mapping and (2) stable optimization through the improvement of effective learning rate. Through two analyses, we specify the desirable and undesirable γ to apply L 2 regularization and propose four guidelines for managing them. In several experiments, we observed that applying L 2 regularization to applicable γ increased 1% to 4% classification accuracy, whereas applying L 2 regularization to inapplicable γ decreased 1% to 3% classification accuracy, which is consistent with our four guidelines. Our proposed guidelines were further validated through various tasks and architectures, including variants of residual networks and transformers.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology

Lead the way for us

Journal: ACM Transactions on Intelligent Systems and Technology	Publication Date: Mar 29, 2024
Citations: 1

Similar Papers

A Cooperative Co-Evolutionary Genetic Neural Network and its Application
Pu Xingcheng ... Sun Pengfei
-
Pu Xingcheng, et. al.Pu Xingcheng ... Sun Pengfei
01 Jan 2012
01 Jan 2012

Batch Normalization in Convolutional Neural Networks — A comparative study with CIFAR-10 data
Vignesh Thakkar ... Chandan Chakraborty
-
Vignesh Thakkar, et. al.Vignesh Thakkar ... Chandan Chakraborty
01 Jan 2018
01 Jan 2018

A novel method to compute the weights of neural networks
Zhentao Gao ... Zhang Yi
Neurocomputing | VOL. 407
Zhentao Gao, et. al.Zhentao Gao ... Zhang Yi
05 May 2020
Neurocomputing | VOL. 407

Residual neural networks for speech recognition
Hari Krishna Vydana ... Anil Kumar Vuppala
-
Hari Krishna Vydana, et. al.Hari Krishna Vydana ... Anil Kumar Vuppala
01 Aug 2017
01 Aug 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology