Abstract

Deep convolutional neural networks have prominent advantages in fields like image identification and natural language processing, but because of their high storage costs and massive computational volumes, they are frequently widely used in servers with GPU acceleration capability. As autonomous driving, aerospace, and other industries evolve, some scenarios have higher requirements for real-time detection than others. Since it is not feasible to search for targets by sending video streams to the server for inference and then returning the results, low-power hardware acceleration options for neural networks must be investigated. In this paper, we suggest an FPGA-based specialized accelerator for convolutional neural networks. To support the parallel execution of each convolution module, we analyze the computational properties of neural networks and design a convolutional computational structure with the deep flow and high parallelism. In addition, each layer of convolution is internally divided into multiple computational units along the channel direction to further enhance the computational parallelism. In this study, we use the Xilinx xc7z100 platform to implement an onboard test for a yolov5s-based neural network. According to the experimental findings, this design structure's acceleration ratio can reach 142 times and its power consumption is only 4.5W, which could provide a significant performance boost at a low power consumption when compared to the 800MHz ARM cortexA9.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call