Abstract

With the fast increasing amount of weights and activations in deep neural networks (DNNs), there is a lot of zero values that incur many unnecessary computations. Besides the zero values, the zero bits in non-zero values, termed as bit sparsity, is oftentimes missing in accelerating DNNs. This paper proposes a new accelerator design that leverages bit sparsity in both weights and activations to improve performance. To harness the bit sparsity, we first propose to dynamically detect the zero bits in activations and substitute the multiply-and-accumulate (MAC) units with bit-wise shift-and-accumulate units to sustain the computing parallelism. To avoid the random number and position of the zero bits, we propose activation grouping and synchronization to dynamically balance the bit-wise workload. Then, we apply the activation bit sparsity aware design to weights and extend it as a double bit sparsity aware architecture. We implement the proposed two accelerators on FPGA built upon VTA accelerator and seamlessly integrate with TVM toolchain for automatic network compilation. The experimental results show that the proposed accelerators have significant performance and area efficiency improvement for ResNet18, ResNet50 and VGG16. Compared with the VTA accelerator, ResNet50 achieves 2.98× speedup and 1.84× area efficiency improvement using activation bit sparsity, and 4.75× speedup and 3.36× area efficiency improvement using double bit sparsity. Compare with the state-of-the-art bit-serial accelerators, our proposed two accelerators achieve 77.3% and 34.9% area efficiency improvement by activation bit sparsity and double bit sparsity, respectively.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call