Symmetric $k$-Means for Deep Neural Network Compression and Hardware Acceleration on FPGAs

Akshay Jain,Saket Anand,Shivam Aggarwal,Pulkit Goel,Alexander Fell

doi:10.1109/jstsp.2020.2968810

Abstract

Convolutional Neural Networks (CNNs) are popular models that have been successfully applied to diverse domains like vision, speech, and text. To reduce inference-time latency, it is common to employ hardware accelerators, which often require a model compression step. Contrary to most compression algorithms that are agnostic of the underlying hardware acceleration strategy, this paper introduces a novel Symmetric $k$ -means based compression algorithm that is specifically designed to support a new FPGA-based hardware acceleration scheme by reducing the number of inference-time multiply-accumulate (MAC) operations by up to 98%. First, a simple $k$ -means based training approach is presented and then as an extension, Symmetric $k$ -means is proposed which yields twice the reduction in MAC operations for the same bit-depth as the simple $k$ -means approach. A comparative analysis is conducted on popular CNN architectures for tasks including classification, object detection and end-to-end stereo matching on various datasets. For all tasks, the model compression down to 3 bits is presented, while no loss is observed in accuracy for the 5-bits quantization.

Full Text