Abstract

In recent years, deep neural networks (DNNs) have been widely applied in various tasks, demonstrating outstanding performance. To further outspread in practical applications, the efficient hardware implementation of DNNs is becoming a critical issue. With the rise of online learning, training DNNs on resource-constrained platforms has attracted more attention most recently. In this paper, we propose an FPGA-based accelerator for efficient DNN training. First, a reconfigurable processing element is designed, which is flexible to support various computation patterns during training in a unified architecture. Second, a well optimized architecture is presented to perform the computation of batch normalization layers in different stages. Finally, a prevailing model (ResNet-20) for CIFAR-10 dataset is implemented on Xilinx VC706 platform with our framework. Experimental results show that our design achieves 421 GOPS and 43.18 GOPS/W in terms of throughput and energy efficiency, respectively. The comparison results illustrate that our accelerator significantly outperforms prior works.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call