Abstract

Deep convolutional neural networks (DCNNs), with extensive computation, require considerable external memory bandwidth and storage for intermediate feature maps. External memory accesses for feature maps become a significant energy bottleneck for DCNN accelerators. Many works have been done on quantizing feature maps into low precision to decrease the costs for computation and storage. There is an opportunity that the large amount of correlation among channels in feature maps can be exploited to further reduce external memory access. Towards this end, we propose a novel compression framework called Significance-aware Transform-based Codec (STC). In its compression process, significance-aware transform is introduced to obtain low-correlated feature maps in an orthogonal space, as the intrinsic representations of original feature maps. The transformed feature maps are quantized and encoded to compress external data transmission. For the next layer computation, the data will be reloaded with STC's reconstruction process. The STC framework can be supported with a small set of extensions to current DCNN accelerators. We implement STC extensions to the baseline TPU architecture for hardware evaluation. The strengthened TPU achieves average reduction of 2.57x in external memory access, 1.95x~2.78x improvement of system-level energy efficiency, with a negligible accuracy loss of only 0.5%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call