DReP: Deep ReLU pruning for fast private inference

Peng Hu,Lei Sun,Cuiyun Hu,Leyu Dai,Song Guo,Miao Yu

doi:10.1016/j.sysarc.2024.103156

Abstract

With increasing concerns about privacy issues in deep learning, privacy-preserving neural network inference has been receiving growing attention from the community, but the implementation lacks practicality due to high latency and high cost. It is suited for latency-efficient private inference (PI) to reduce ReLU in neural networks. Existing methods of ReLU reduction for efficient PI usually disregard the benefits of pre-trained models and may introduce more complexities in the era of large models. In this paper, we propose a novel method called DReP, which leverages the output states of neurons to perform deep pruning of ReLU in neural networks for Non-Retraining and Non-Redesign (NTND) scenarios. DReP is based on the identify of a consistent correlation between the average neuronal output states and the importance of ReLUs, and enables deep pruning of ReLU in addition to structured ReLU pre-pruning methods. Notably, DReP does not require modification of the network model structure or extensive retraining, aligning with the requirements of NTND applications. Experimental results demonstrate that our method reduces the number of ReLUs in the original network by 16.66× while maintaining network accuracy. Moreover, when compared with state-of-the-art NTND methods, our approach achieves a pruning rate improvement of 1.57×∼2.32× while preserving comparable privacy inference accuracy.

Full Text