A Five-Year Journey to Accelerate Homomorphic Encryption with GPUs, Demonstrated by Sub-25ms CNN Inference

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Driven by growing interest in homomorphic encryption (HE)—a nextgeneration encryption technology that enables direct computation on encrypted data—the DARPA DPRIVE program launched in early 2021 with a goal of developing an ASIC capable of inferring a seven-layer convolutional neural network (CNN7) on encrypted data under 25 ms within a four-year program, a target that seemed daunting at the time. In 2025, we achieved this goal using an off-the-shelf GPU. This paper presents our journey to this remarkable milestone.We synergistically leverage a state-of-the-art GPU implementation of the CKKS HE scheme, a method to replace costly homomorphic ReLU with quadratic functions refined through knowledge distillation, and an optimized convolution algorithm that minimizes expensive homomorphic rotations. Further, we introduce a new convolution method that further reduces rotation overhead in deeper CNN layers with smaller feature maps. Together, these efforts enable CNN7 inference in just 22.4 ms on an NVIDIA RTX 5090 GPU.

Similar Papers
  • Conference Article
  • Cite Count Icon 14
  • 10.1109/fccm53951.2022.9786115
FPGA Accelerator for Homomorphic Encrypted Sparse Convolutional Neural Network Inference
  • May 15, 2022
  • Yang Yang + 3 more

Homomorphic Encryption (HE) is a promising solution to the increasing concerns of privacy in machine learning. But HE-based CNN inference remains impractically slow. Pruning can significantly reduce the compute and memory footprint of CNNs. However, homomorphic encrypted Sparse Convolutional Neural Networks (SCNN) have vastly different compute and memory characteristics compared with unencrypted SCNN. Simply extending the design principles of existing SCNN accelerators may offset the potential acceleration offered by sparsity. To realize fast execution, we propose an FPGA accelerator to speedup the computation of linear layers, the main computational bottleneck in HE SCNN batch inference. First, we analyze the memory requirements of various linear layers in HE SCNN and discuss the unique challenges. Motivated by the analysis, we present a novel dataflow specially designed to optimize HE SCNN data reuse coupled with an efficient scheduling policy that minimizes on-chip SRAM access conflicts. Leveraging the proposed dataflow and scheduling algorithm, we demonstrate the first end-to-end acceleration of HE SCNN batch inference targeting CPU-FPGA heterogeneous platforms. For a batch of 8K images, our design achieves up to 5.6× speedup in inference latency compared with the CPU-only solution for widely studied 6-layer and 11-layer HE CNNs.

  • Research Article
  • 10.46586/tches.v2026.i1.1-25
PipFHE: Resource-Efficient Privacy-Preserving Deep CNN Inference via Padded Batch Packing and Channel Merging over FHE
  • Jan 16, 2026
  • IACR Transactions on Cryptographic Hardware and Embedded Systems
  • Tianyu Wang + 5 more

Privacy-Preserving Machine Learning (PPML) has demonstrated great potential in data-sensitive industries, driving the development of low-latency CNN architectures using Fully Homomorphic Encryption (FHE). However, existing methods encounter two primary challenges when processing large image datasets: 1) Multithreading approaches, which classify one image per thread, necessitate a large number of threads and demand significant memory and CPU resources. 2) Batch packing methods are hampered by inflated ciphertext counts and inefficient handling of manage image padding and consecutive convolutions, limiting their use in deep networks. These issues create a clear need for more resource-efficient and architecturally flexible FHE-based CNN inference. To address this, we propose PipFHE, an FHE-friendly privacy-preserving CNN inference approach based on RNS-CKKS. We leverage the batch packing method and introduce two effective padding strategies for efficient encrypted image convolution. Furthermore, we propose a Channel Merging method, which notably reduces the ciphertext numbers, enabling deep network architecture. PipFHE is also compatible with pre-trained standard model parameters, ensuring high flexibility. Evaluations on the CIFAR-10 and CIFAR-100 show that PipFHE achieves an amortized inference speedup and throughput increase of 1.35x to 1.83x compared to state-of-the-art designs on the same test platform. Moreover, PipFHE performs inference on 227 encrypted images using only 36 threads and 144 GB of memory, which is 2.8x lower than other research. While PipFHE incurs an accuracy drop of <0.9% compared to plaintext inference, this strategic trade-off delivers substantial reductions in hardware requirements and enable deep network architectures. Its significant resource efficiency and support for deep networks make PipFHE a practical solution for processing large image batches in resource-constrained, privacy-sensitive cloud environments.

  • Conference Article
  • Cite Count Icon 27
  • 10.1109/hpca56546.2023.10071133
FxHENN: FPGA-based acceleration framework for homomorphic encrypted CNN inference
  • Feb 1, 2023
  • Yilan Zhu + 3 more

Fully homomorphic encryption (FHE) is a promising data privacy solution for machine learning, which allows the inference to be performed with encrypted data. However, it typically leads to 5-6 orders of magnitude higher computation and storage overhead. This paper proposes the first full-fledged FPGA acceleration framework for FHE-based convolution neural network (HE-CNN) inference. We then design parameterized HE operation modules with intra- and inter- HE-CNN layer resource management based on FPGA high-level synthesis (HLS) design flow. With sophisticated resource and performance modeling of the HE operation modules, the proposed FxHENN framework automatically performs design space exploration to determine the optimized resource provisioning and generates the accelerator circuit for a given HE-CNN model on a target FPGA device. Compared with the state-of-the-art CPU-based HE-CNN inference solution, FxHENN achieves up to 13.49X speedup of inference latency, and 1187.12X energy efficiency. Meanwhile, given this is the first attempt in the literature on FPGA acceleration of fullfledged non-interactive HE-CNN inference, our results obtained on low-power FPGA devices demonstrate HE-CNN inference for edge and embedded computing is practical.

  • Research Article
  • Cite Count Icon 3
  • 10.1109/tcc.2024.3443405
Efficient Secure CNN Inference: A Multi-Server Framework Based on Conditional Separable and Homomorphic Encryption
  • Oct 1, 2024
  • IEEE Transactions on Cloud Computing
  • Longlong Sun + 3 more

Deep learning inference has become a fundamental component of cloud service providers, while privacy issues during services have received significant attention. Although many privacy-preserving schemes have been proposed, they require further improvement. In this article, we propose <i>Serpens</i>, an efficient convolutional neural network (CNN) secure inference framework to protect users’ uploaded data. We introduce a pair of novel concepts, namely separable and conditional separable, to determine whether a layer in CNNs can be computed over multiple servers or not. We demonstrate that linear layers are separable and construct factor-functions to reduce their overhead to nearly zero. For the two nonlinear layers, i.e., ReLU and max pooling, we design four secure protocols based on homomorphic encryption and random masks for two- and n-server settings. These protocols are essentially different from existing schemes, which are primarily based on garbled circuits. In addition, we extensively propose a method to split the image securely. The experimental results demonstrate that <i>Serpens</i> is <inline-formula><tex-math notation="LaTeX">$60\times -197\times$</tex-math></inline-formula> faster than the previous scheme in the two-server setting. The superiority of <i>Serpens</i> is even more significant in the n-server setting, only less than an order of magnitude slower than performing plaintext inference over clouds.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icpads60453.2023.00063
Secure CNN Training and Inference based on Multi-key Fully Homomorphic Encryption
  • Dec 17, 2023
  • Hong Qin + 3 more

Convolutional neural network (CNN) has attracted increasing attention and been widely used in imaging processing, bioinformatics and so on. As the cloud computing and multiparty computing are booming, the training and inference data of convolutional neural network often comes from diverse users. These users tend to jointly perform the computation but reluctantly share original data with others. Multi-key fully homomorphic encryption (MKFHE) supports homomorphic computation on ciphertexts encrypted with different keys, which is especially suitable for this scenario. In this paper, we firstly propose secure convolution, matrix multiplication, comparison and maximum protocols based on MKFHE. Then we design the secure CNN training and inference framework, outsourcing almost all computations to cloud server. To improve the efficiency, we use key switching technique for ciphertext transformation. We prove that the proposed frameworks are secure and feasible. The theoretical and experimental analysis show that our framework achieves the trade-off between security, efficiency and scalability.

  • Research Article
  • Cite Count Icon 1
  • 10.1587/transfun.2024eal2090
Polynomial Approximations of ReLU for Secure CNN Inference in Homomorphic Encryption Environments
  • Jan 1, 2025
  • IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
  • Pierpaolo Agamennone

In this paper, we introduce a novel approach to improve secure neural network inference by addressing the challenges posed by homomorphic encryption, specifically within the context of the CKKSscheme. A major limitation in homomorphic encryption is the inability to efficiently handle non-linear activation functions, such as ReLU, due to their nonpolynomial nature. We propose an innovative 7th-degree polynomial approximation of the ReLU function, generated using the Remez algorithm, which closely mimics ReLU's behavior while being fully compatible with encrypted operations. To further optimize performance, we introduce dynamic domain extension techniques, which allow for efficient scaling of inputs during polynomial evaluation, significantly reducing computational overhead. Our method is validated using the MNIST dataset, demonstrating secure inference on encrypted data with 97.93% accuracy, while achieving near-plaintext performance. This work represents a significant step forward in the practical application of homomorphic encryption for neural network inference, providing a more efficient and accurate approach to approximating non-linear functions under encryption.

  • Research Article
  • Cite Count Icon 89
  • 10.1109/tifs.2023.3263631
Optimized Privacy-Preserving CNN Inference With Fully Homomorphic Encryption
  • Jan 1, 2023
  • IEEE Transactions on Information Forensics and Security
  • Dongwoo Kim + 1 more

Inference of machine learning models with data privacy guarantees has been widely studied as privacy concerns are getting growing attention from the community. Among others, secure inference based on Fully Homomorphic Encryption (FHE) has proven its utility by providing stringent data privacy at sometimes affordable cost. Still, previous work was restricted to shallow and narrow neural networks and simple tasks due to the high computational cost incurred from FHE. In this paper, we propose a more efficient way of evaluating convolutions with FHE, where the cost remains constant regardless of the kernel size, resulting in 12–46× timing improvement on various kernel sizes. Combining our methods with FHE bootstrapping, we achieve at least 18.9% (and 48.1%) timing reduction in homomorphic evaluation of 20-layer CNN classifiers (and a part of it) on CIFAR10/100 (and ImageNet, respectively) datasets. Furthermore, in consideration of our methods being effective for evaluating CNNs with intensive convolutional operations and exploring such CNNs, we achieve at least 5× faster inference on CIFAR10/100 with FHE than the prior works having the same or less accuracy.

Save Icon
Up Arrow
Open/Close