Abstract

In recent years, convolutional neural networks (CNNs) have achieved significant advancements in various fields. However, the computation and storage overheads of CNNs are overwhelming for Internet-of-Things devices. Both network pruning algorithms and hardware accelerators have been introduced to empower CNN inference at the edge. Network pruning algorithms reduce the size and computational cost of CNNs by regularizing unimportant weights to zeros. However, existing works lack intrakernel structured types to tradeoff between sparsity and hardware efficiency, and the index storage for irregularly pruned networks is significant. Hardware accelerators leverage the sparsity of pruned CNNs to improve energy efficiency. However, their process element (PE) utilization rate is low because of uneven sparsity among input convolutional kernels. To overcome these problems, we propose PACA: a Pattern pruning Algorithm and Channel-fused high PE utilization Accelerator for CNNs. It includes three parts: a pattern pruning algorithm to explore the intrakernel sparsity type and reduce the index storage, a channel-fused hardware architecture to reduce the PEs’ idle rate and improve the performance, and a heuristic and taboo search-based smart fusion scheduler to analyze the idle PE problem and schedule the channel fusion in hardware. To demonstrate the effectiveness of PACA, we have implemented the software parts by Python and the hardware architecture by RTL codes. Experimental results on various datasets show that compared with an existing work, PACA can reduce the index storage overhead by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$3.47\times $ </tex-math></inline-formula> – <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5.63\times $ </tex-math></inline-formula> with 3.85–9.12 average patterns, and it can improve the hardware performance by <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.01\times $ </tex-math></inline-formula> – <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5.53\times $ </tex-math></inline-formula> because of PEs’ idle rate reduction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.