Abstract
The DenseNet achieves remarkable performance in various computer vision tasks with much fewer parameters and operations. However, there are few acceleration designs about the DenseNet, due to its dense-connectivity structure. In this paper, we apply the binary weight method on the DenseNet and then propose a hybrid-pipelined architecture for FPGA-based acceleration of the binary weight DenseNet, which can be stored entirely in a chip. To deal with the dense-connectivity, a reusable convolution unit is developed to support conv1×1 and conv3×3 efficiently. Moreover, a theoretical method of system parallelism is proposed to guide the top-level pipelined design for the maximum efficiency. To evaluate the proposed architecture, the binary weight DenseNet-100 model is trained on CIFAR10 dataset and then implemented on VX690T FPGA, at the cost of 4.18% accuracy loss. The experiment demonstrates that our architecture can achieve the throughput of 514 GOPS and 889 FPS at 200MHz, and the performance-efficiency is up to 62.4%, which outperforms the most related works.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.