ADMM-based Weight Pruning for Real-Time Deep Learning Acceleration on Mobile Devices

Hongjia Li,Shaokai Ye,Sheng Lin,Ning Liu,Yanzhi Wang,Wenyao Xu,Xue Lin,Tianyun Zhang,Xiaolong Ma

doi:10.1145/3299874.3319492

Abstract

Deep learning solutions are being increasingly deployed in mobile applications, at least for the inference phase. Due to the large model size and computational requirements, model compression for deep neural networks (DNNs) becomes necessary, especially considering the real-time requirement in embedded systems. In this paper, we extend the prior work on systematic DNN weight pruning using ADMM (Alternating Direction Method of Multipliers). We integrate ADMM regularization with masked mapping/retraining, thereby guaranteeing solution feasibility and providing high solution quality. Besides superior performance on representative DNN benchmarks (e.g., AlexNet, ResNet), we focus on two new applications facial emotion detection and eye tracking, and develop a top-down framework of DNN training, model compression, and acceleration in mobile devices. Experimental results show that with negligible accuracy degradation, the proposed method can achieve significant storage/memory reduction and speedup in mobile devices.

Full Text