StreamliNet: Cost-aware layer-wise neural network linearization for fast and accurate private inference

Zhi Pang,Lina Wang,Fangchao Yu,Kai Zhao,Bo Zeng

doi:10.1016/j.ins.2024.120463

Abstract

Private inference (PI) allows a client and a server to perform cryptographically-secure deep neural network inference without disclosing their sensitive data to each other. Despite the strong security guarantee, existing models are ill-suited for PI since their overused non-linear operations such as ReLUs are computationally expensive in the regime of ciphertext and therefore dominate the PI latency. Previous solutions on ReLU optimization either ignore the intrinsic importance of ReLU or suffer from significant accuracy loss. In this work, we propose StreamliNet, an importance-driven gradient-based framework to speed up PI latency and retain inference accuracy. Specifically, we first present a novel notion of ReLU negativity as a proxy for the ReLU importance in a multivariate metric to precisely identify layer-wise budgets. Then, our StreamliNet automates the selection of performance-insensitive ReLUs for linearization and learns the non-linearity sparse model where ReLUs are present in each layer with appropriate counts and locations. Moreover, in order to reduce the activation map discrepancy, we develop a cost-aware post-activation consistency constraint to prioritize the linearization of ReLUs with low cost while further mitigating the model performance degradation. Extensive experiments on various models and datasets demonstrate that StreamliNet outperforms the state-of-the-arts such as SNL (ICML 22) and SENet (ICLR 23) by boosting 3.09% more accuracy with iso-ReLU budget or requiring 2× fewer ReLUs with iso-accuracy, on CIFAR-100.

Full Text