Abstract

Convolutional Neural Networks (CNNs) have been widely used in machine learning tasks. While delivering state-of-the-art accuracy, CNNs are known as both compute- and memory-intensive. This paper presents the SqueezeFlow accelerator architecture that exploits sparsity of CNN models for increased efficiency. Unlike prior accelerators that trade complexity for flexibility, SqueezeFlow exploits concise convolution rules to benefit from the reduction of computation and memory accesses as well as the acceleration of existing dense architectures without intrusive PE modifications. Specifically, SqueezeFlow employs a PT-OS-sparse dataflow that removes the ineffective computations while maintaining the regularity of CNN computations. We present a full design down to the layout at 65 nm, with an area of 4.80 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> and power of 536.09 mW. The experiments show that SqueezeFlow achieves a speedup of 2:9× on VGG16 compared to the dense architectures, with an area and power overhead of only 8.8 and 15.3 percent, respectively. On three representative sparse CNNs, SqueezeFlow improves the performance and energy efficiency by 1:8× and 1:5× over the state-of-the-art sparse accelerators.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call