Abstract
Since Convolutional neural networks (CNNs) need massive computing resources, lots of computing architectures are proposed to improve the throughput and energy efficiency of the computing. However, those computing architectures need high data movement between the chip and off-chip memories, which causes high energy consumption on the off-chip memory; thus, the feature map (fmap) compression has been discussed for reducing the data movement. Therefore, the design of fmap compression becomes one of the main researches on CNN accelerator for energy efficiency on the off-chip memory. In this brief, we proposed floating-point (FP) fmap compression for a hardware accelerator which includes hardware design and a compression algorithm. This can apply quantization methods such as ternary neural quantization (TTQ), which only quantized weights with little or no degradation in accuracy and reduces the computation cost. In addition to the zero compression, we also compress nonzero values in the fmap based on the FP format. The compression algorithm achieves low area overhead and a similar compression ratio compared with the state-of-the-art on ILSVRC 2012 dataset.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Circuits and Systems II: Express Briefs
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.