High-Speed CNN Accelerator SoC Design Based on a Flexible Diagonal Cyclic Array

Dong-Yeong Lee,Hayotjon Aliev,Sang-Hoon Sim,Keon-Myung Lee,Hyung-Won Kim,Sang-Bo Park,Muhammad Junaid

doi:10.3390/electronics13081564

Dong-Yeong Lee, Hayotjon Aliev + Show 5 more

Open Access

https://doi.org/10.3390/electronics13081564

Copy DOI

Journal: Electronics	Publication Date: Apr 19, 2024
Citations: 1	License type: CC BY 4.0

Affiliation: Chungbuk National University

Abstract

The latest convolutional neural network (CNN) models for object detection include complex layered connections to process inference data. Each layer utilizes different types of kernel modes, so the hardware needs to support all kernel modes at an optimized speed. In this paper, we propose a high-speed and optimized CNN accelerator with flexible diagonal cyclic arrays (FDCA) that supports the acceleration of CNN networks with various kernel sizes and significantly reduces the time required for inference processing. The accelerator uses four FDCAs to simultaneously calculate 16 input channels and 8 output channels. Each FDCA features a 4 × 8 systolic array that contains a 3 × 3 processing element (PE) array and is designed to handle the most commonly used kernel sizes. To evaluate the proposed CNN accelerator, we mapped the widely used YOLOv5 CNN model and evaluated the performance of its implementation on the Zynq UltraScale+ MPSoC ZCU102 FPGA. The design consumes 249,357 logic cells, 2304 DSP blocks, and only 567 KB BRAM. In our evaluation, the YOLOv5n model achieves an accuracy of 43.1% (mAP@0.5). A prototype accelerator has been implemented using Samsung’s 14 nm CMOS technology. It achieves 1.075 TOPS, a peak performance with a 400 MHz clock frequency.

Full Text