Abstract

In recent years, convolutional neural networks (CNNs) have been widely used. However, their ever-increasing amount of parameters makes it challenging to train them with the GPUs, which is time and energy expensive. This has prompted researchers to turn their attention to training on more energy-efficient hardware. batch normalization (BN) layer has been widely used in various state-of-the-art CNNs for it is an indispensable layer in the acceleration of CNN training. As the amount of computation of the convolutional layer declines, its importance continues to increase. However, the traditional CNN training accelerators do not pay attention to the efficient hardware implementation of the BN layer. In this letter, we design an efficient CNN training architecture by using the systolic array. The processing element of the systolic array can support the BN functions both in the training process and the inference process. The BN function implemented is an improved, hardware-friendly BN algorithm, range batch normalization (RBN). The experimental results show that the implementation of RBN saves 10% hardware resources, reduces the power by 10.1%, and the delay by 4.6% on average. We implement the accelerator on the field programmable gate array VU440, and the power consumption of the its core computing engine is 8.9 W.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.