HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning

Yang Wang,Leibo Liu,Yubin Qin,Shaojun Wei,Shouyi Yin

doi:10.1109/aicas51828.2021.9458410

Abstract

Enlightened by the fact that deep-neural-networks (DNNs) are typically highly over-parameterized, weight-pruning-based sparse training (ST) becomes a practical method to reduce training computation and compress models. However, the previous pruning algorithms are either with a coarse-grained pattern or a fine-grained pattern. They lead to a limited pruning ratio or a drastically irregular sparsity distribution, which is computation-intensive or logic-complex for hardware implementation. Meanwhile, the current DNN processors focus on sparse inference but cannot support emerging ST techniques. This paper proposes a co-design approach where the algorithm is adapted to suit the hardware constraints and the hardware exploit the algorithm property to accelerate sparse training. We first present a novel pruning algorithm, hybrid weight pruning, including channel-wise and line-wise pruning. It reaches a considerable pruning ratio while maintaining the hardware friendly property. Then we design a hardware architecture, Hybrid Pruning Processing Unit, HPPU, to accelerate the proposed algorithm. It develops a 2-level active data selector and a sparse convolution engine, which maximize hardware utilization when handling the hybrid sparsity patterns during training. We evaluate HPPU by synthesizing it with 28nm CMOS technology. HPPU achieves 50.1% higher pruning ratio than coarse-grained pruning and 1.53× higher energy-efficiency than fine-grained pruning. The peak energy-efficiency of HPPU is 126.04TFLOPs/W, outperforming state-of-the-art trainable processor GANPU 1.67×. When training a ResNet18 model, HPPU consumes 3.72× less energy and offers 4.69× speedup, and maintains unpruned accuracy.

Full Text