Abstract
As a classical artificial intelligence algorithm, the convolutional neural network (CNN) algorithm plays an important role in image recognition and classification and is gradually being applied in the Internet of Things (IoT) system. A compact CNN accelerator for the IoT endpoint System-on-Chip (SoC) is proposed in this paper to meet the needs of CNN computations. Based on analysis of the CNN structure, basic functional modules of CNN such as convolution circuit and pooling circuit with a low data bandwidth and a smaller area are designed, and an accelerator is constructed in the form of four acceleration chains. After the acceleration unit design is completed, the Cortex-M3 is used to construct a verification SoC and the designed verification platform is implemented on the FPGA to evaluate the resource consumption and performance analysis of the CNN accelerator. The CNN accelerator achieved a throughput of 6.54 GOPS (giga operations per second) by consuming 4901 LUTs without using any hardware multipliers. The comparison shows that the compact accelerator proposed in this paper makes the CNN computational power of the SoC based on the Cortex-M3 kernel two times higher than the quad-core Cortex-A7 SoC and 67% of the computational power of eight-core Cortex-A53 SoC.
Highlights
With the development of Internet of Things (IoT) technology and artificial intelligence (AI)algorithms, AI computing has moved from the cloud down to the edge [1]
Based on our previous work, this paper further extends a compact Convolutional neural network (CNN) accelerator design for the IoT endpoint SoC, which can further improve the performance while reducing the area of the circuit
Most state-of-the-art neural networks contain a large number of convolution layers, such as VGG-16 which contains at least 13 convolution layers and Alexnet which contains five convolution layers, so the acceleration of the convolution operation is the focus of this CNN accelerator
Summary
With the development of Internet of Things (IoT) technology and artificial intelligence (AI). As stated in [5,6], most of the network structures in CNN are deployed and implemented by hardware This design can really achieve high computational acceleration performance but it requires great resources and energy overheads which cannot be applied in resource-constrained IoT node devices. The authors have designed a multi-functional CNN accelerator for IoT SoC in [11] to meet the needs of CNN computing in the IoT scenario This design reduces resource consumption by expanding and reusing basic module circuits. Based on our previous work, this paper further extends a compact CNN accelerator design for the IoT endpoint SoC, which can further improve the performance while reducing the area of the circuit.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.