Abstract

Convolutional Neural Networks (CNNs) have gained widespread popularity in the field of computer vision and image processing. Due to huge computational requirements of CNNs, dedicated hardware-based implementations are being explored to improve their performance. Hardware platforms such as Field Programmable Gate Arrays (FPGAs) are widely being used to design parallel architectures for this purpose. In this paper, we analyze Winograd minimal filtering or fast convolution algorithms to reduce the arithmetic complexity of convolutional layers of CNNs. We explore a complex design space to find the sets of parameters that result in improved throughput and power-efficiency. We also design a pipelined and parallel Winograd convolution engine that improves the throughput and power-efficiency while reducing the computational complexity of the overall system. Our proposed designs show up to 4.75× and 1.44× improvements in throughput and power-efficiency, respectively, in comparison to the state-of-the-art design while using approximately 2.67× more multipliers. Furthermore, we obtain savings of up to 53.6% in logic resources compared with the state-of-the-art implementation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.