Abstract

Batch normalization (BN) has recently become a standard component for accelerating and improving the training of deep neural networks (DNNs). However, BN brings in additional calculations, consumes more memory, and significantly slows down the training iteration. Furthermore, the nonlinear square and sqrt operations in the normalization process impede low bit-width quantization techniques, which draw much attention to the deep learning hardware community. In this paper, we propose an L1 -norm BN (L1BN) with only linear operations in both forward and backward propagations during training. L1BN is approximately equivalent to the conventional L2 -norm BN (L2BN) by multiplying a scaling factor that equals (π/2)1/2 . Experiments on various convolutional neural networks and generative adversarial networks reveal that L1BN can maintain the same performance and convergence rate as L2BN but with higher computational efficiency. In real application-specified integrated circuit synthesis with reduced resources, L1BN achieves 25% speedup and 37% energy saving compared to the original L2BN. Our hardware-friendly normalization method not only surpasses L2BN in speed but also simplifies the design of deep learning accelerators. Last but not least, L1BN promises a fully quantized training of DNNs, which empowers future artificial intelligence applications on mobile devices with transfer and continual learning capability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call