Abstract

Convolutional neural networks (CNNs) have been widely utilized in modern artificial intelligent (AI) systems. In particular, GoogLeNet, one of the most popular CNNs, consisting of a number of inception layers and max-pooling layers, has been intensively studied for mobile and embedded scenarios. However, the energy efficiency of GoogLeNet in hardware is still limited as the huge data movement between the processor and the memory. Therefore, designing a dataflow and the corresponding hardware architecture to achieve parallel processing with minimal data movement is rather critical to achieve high energy efficiency and throughput. In this paper, we propose a novel column stationary (CS) dataflow that maximally exploits the local data reuse of both the filter weights and feature maps. Moreover, a reconfigurable spatial architecture was proposed to map multiple convolution kernels (with different types and dimensions) in parallel to the processing engines (PEs) array. In this case, multiple convolution kernels can share the same input feature maps (activations) in computing process. In our hardware design, we utilize three typical convolution kernels (i.e., <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5 \times 5$ </tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3 \times 3$ </tex-math></inline-formula> , <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1 \times 1$ </tex-math></inline-formula> , corresponding to the inception layers of GoogLeNet) as an example to test the efficiency of our proposed dataflow and hardware architecture. The accelerator was implemented for one inception layer of the GoogLeNet in a 55-nm foundry’s CMOS process. The test results show that our CS dataflow can reduce ~85% energy consumption for memory access and save area of 13% and power of 12% for computing. In summary, our CS dataflow is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.2\times $ </tex-math></inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.5\times $ </tex-math></inline-formula> more energy-efficient compared to state-of-the-art dataflows.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.