Abstract

Convolutional neural network (CNN) accelerators have achieved great success from cloud to edge scenarios. However, given the trend towards even larger and deeper neural network models, it remains a challenging problem to efficiently process these CNNs especially on edge devices with limited energy budget. Accordingly, reducing the energy consumption is of paramount importance for sustainable CNN accelerators. In this paper, we propose AdaPrune, a novel pruning technique that reduces model size and computation to achieve performance improvement and energy savings for CNN accelerators. Unlike previous pruning techniques that sacrifice either computational regularity or accuracy, AdaPrune maintains both by customizing CNN pruning for the underlying accelerators to maximally leverage the sparsity benefits. AdaPrune consists of two techniques: input channel group pruning and output channel group pruning. By analyzing the weight fetching patterns of sparse CNN accelerators, AdaPrune adaptively switches between the two techniques to guarantee that the zeros are evenly distributed in each fetching group. In doing so, the pruned network structure preserves customized computational regularity for the underlying accelerators, thereby boosting the performance and energy efficiency. We evaluate AdaPrune on three sparse CNN accelerators with different spatial tiling strategies. The experimental results show that AdaPrune achieves up to 1.6× performance speedup, and 1.5× energy savings compared to unstructured pruning.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call