Abstract
With their high energy efficiency, processing-in-memory (PIM) arrays are being increasingly used for the convolutional neural network (CNN) inference. In the PIM-based CNN inference, the computational latency and energy are dependent on how the CNN weights are mapped to the PIM array. A recent study proposed the shifted and duplicated kernel (SDK) mapping that reuses input feature maps with a unit of a parallel window, which is convolved with duplicated kernels to obtain multiple output elements in parallel. However, the existing SDK-based mapping algorithm does not always result in minimum computing cycles because it only maps a square-shaped parallel window with the entire channels. In this paper, we introduce a novel mapping algorithm called variable-window SDK (VW-SDK), which adaptively determines the shape of the parallel window that leads to the minimum computing cycles for the given convolutional layer and PIM array. By allowing rectangular-shaped windows with partial channels, VW-SDK utilizes the PIM array more efficiently, thereby further reducing the number of computing cycles. To further remove the inefficient computing cycles caused by the residual channels, we extend the VW-SDK algorithm into VWC-SDK (SDK with variable windows and channels) that additionally performs residual channel pruning. The simulation with a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$512\times 512$ </tex-math></inline-formula> PIM array and ResNet-20 shows that VW-SDK improves the inference speed by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.29\times $ </tex-math></inline-formula> compared to the existing SDK-based algorithm. The results also show that residual channel pruning improves the inference speed of ResNet-20 by up to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\sim 1.38\times $ </tex-math></inline-formula> when compared to the original network without pruning.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.