Abstract

As the application of deep learning continues to grow, so does the amount of data used to make predictions. While traditionally big-data deep learning was constrained by computing performance and off-chip memory bandwidth, a new constraint has emerged: privacy. One solution is homomorphic encryption (HE). Applying HE to the client-cloud model allows cloud services to perform inferences directly on clients’ encrypted data. While HE can meet privacy constraints it introduces enormous computational challenges and remains impractically slow on current systems.This paper introduces Cheetah, a set of algorithmic and hardware optimizations for server-side HE DNN inference. Cheetah proposes HE-parameter tuning and operator scheduling optimizations, which together deliver up to $79 \times$ speedup over the state-of-the-art. However, HE inference still falls short of real-time inference speeds by nearly four orders of magnitude. Cheetah further proposes an accelerator architecture to understand the degree of speedup hardware can provide and whether it can bridge HE’s real-time performance gap. We evaluate several DNNs and find that privacy-preserving HE inference for ResNet50 can approach real-time speeds with a 587mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> accelerator dissipating 30W in 5nm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call