Momentum Batch Normalization for Deep Learning with Small Batch Size

Hongwei Yong,Jianqiang Huang,Lei Zhang,Xiansheng Hua,Deyu Meng

doi:10.1007/978-3-030-58610-2_14

Abstract

Normalization layers play an important role in deep network training. As one of the most popular normalization techniques, batch normalization (BN) has shown its effectiveness in accelerating the model training speed and improving model generalization capability. The success of BN has been explained from different views, such as reducing internal covariate shift, allowing the use of large learning rate, smoothing optimization landscape, etc. To make a deeper understanding of BN, in this work we prove that BN actually introduces a certain level of noise into the sample mean and variance during the training process, while the noise level depends only on the batch size. Such a noise generation mechanism of BN regularizes the training process, and we present an explicit regularizer formulation of BN. Since the regularization strength of BN is determined by the batch size, a small batch size may cause the under-fitting problem, resulting in a less effective model. To reduce the dependency of BN on batch size, we propose a momentum BN (MBN) scheme by averaging the mean and variance of current mini-batch with the historical means and variances. With a dynamic momentum parameter, we can automatically control the noise level in the training process. As a result, MBN works very well even when the batch size is very small (e.g., 2), which is hard to achieve by traditional BN.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Momentum Batch Normalization for Deep Learning with Small Batch Size

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Two recent advances on normalization methods for deep neural network optimization
Lei Zhang
-
Lei ZhangLei Zhang
01 Dec 2020
01 Dec 2020

Reducing Adversarial Vulnerability through Adaptive Training Batch Size
Ken Sasongko ... Mohamad Ivan Fanany
Jurnal Ilmu Komputer dan Informasi | VOL. 14
Ken Sasongko, et. al.Ken Sasongko ... Mohamad Ivan Fanany
28 Feb 2021
Jurnal Ilmu Komputer dan Informasi | VOL. 14

Instance Segmentation Based on Improved Self-Adaptive Normalization.
Sen Yang ... Qijuan Yang
Sensors | VOL. 22
Sen Yang, et. al.Sen Yang ... Qijuan Yang
10 Jun 2022
Sensors | VOL. 22

NanoAdapt: Mitigating Negative Transfer in Test Time Adaptation with Extremely Small Batch Sizes
Shiji Zhao ... Shao-Yuan Li
-
Shiji Zhao, et. al.Shiji Zhao ... Shao-Yuan Li
01 Aug 2024
01 Aug 2024

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Momentum Batch Normalization for Deep Learning with Small Batch Size

Abstract

Talk to us

Similar Papers