Abstract

Convolutional neural network (CNN) training often necessitates a considerable amount of computational resources. In recent years, several studies have proposed for CNN inference and training accelerators in which the FPGAs have previously demonstrated good performance and energy efficiency. To speed up the processing, CNN requires additional computational resources such as memory bandwidth, a FPGA platform resource usage, time, power consumption, and large datasets for training. They are constrained by the requirement for improved hardware acceleration to support scalability beyond existing data and model sizes. This paper proposes a procedure for energy efficient CNN training in collaboration with an FPGA-based accelerator. We employed optimizations such as quantization, which is a common model compression technique, to speed up the CNN training process. Additionally, a gradient accumulation buffer is used to ensure maximum operating efficiency while maintaining gradient descent of the learning algorithm. To validate the design, we implemented the AlexNet and VGG-16 models on an FPGA board and laptop CPU along side GPU. It achieves 203.75 GOPS on Terasic DE1 SoC with the AlexNet model and 196.50 GOPS with the VGG-16 model on Terasic DE-SoC. Our result also exhibits that the FPGA accelerators are more energy efficient than other platforms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call