Abstract

Convolutional Neural Networks (CNN) are widely used for image classification and have achieved significantly accurate performance in the last decade. However, they require computationally intensive operations for embedded applications. In recent years, FPGA-based CNN accelerators have been proposed to improve energy efficiency and throughput. While dynamic partial reconfiguration (DPR) is increasingly used in CNN accelerators, the performance of dynamically reconfigurable accelerators is usually lower than the performance of pure static FPGA designs. This work presents a dynamically reconfigurable CNN accelerator architecture that does not sacrifice throughput performance or classification accuracy. The proposed accelerator is composed of reconfigurable macroblocks and dynamically utilizes the device resources according to model parameters. Moreover, we devise a novel approach, to the best of our knowledge, to hide the computations of the pooling layers inside the convolutional layers, thereby further improving throughput. Using the proposed architecture and DPR, different CNN architectures can be realized on the same FPGA with optimized throughput and accuracy. The proposed architecture is evaluated by implementing two different LeNet CNN models trained by different datasets and classifying different classes. Experimental results show that the implemented design achieves higher throughput than current LeNet FPGA accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call