Abstract

Convolutional neural networks (CNNs) are widely implemented in modern facial recognition systems for image recognition applications. Runtime speed is a critical parameter for real-time systems. Traditional FPGA-based accelerations require either large on-chip memory or high bandwidth and high memory access time that slow down the network. The proposed work uses an algorithm and its subsequent hardware design for a quick CNN computation using an overlap-and-add-based technique in the time domain. In the algorithm, the input images are broken into tiles that can be processed independently without computing overhead in the frequency domain. This also allows for efficient concurrency of the convolution process, resulting in higher throughput and lower power consumption. At the same time, we maintain low on-chip memory requirements necessary for faster and cheaper processor designs. We implemented CNN VGG-16 and AlexNet models with our design on Xilinx Virtex-7 and Zynq boards. The performance analysis of our design provides 48% better throughput than the state-of-the-art AlexNet and uses 68.85% lesser multipliers and other resources than the state-of-the-art VGG-16.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call