By harnessing fundamental quantum properties, a large-scale quantum computer could undermine currently deployed public-key algorithms. The post-quantum, code-based cryptosystem Classic McEliece (CM) addresses this security concern. However, its large public key size (up to 1.3MB) poses various hardware implementation challenges. In this paper, we focus on the high memory bandwidth requirements of the CM encoding function, in the context of heterogeneous CPU-FPGA devices. More concretely, we target the acceleration of public-key loading and processing from any globally-shared or accelerator-private memory system. We present a novel and constant-time accelerator eEnc that exploits the elevated parallelization potential of FPGA devices to yield high-performance results. Our accelerator implements the encoding and the random error vector generation functions, which comprise the main computational load of Encapsulation. Two accelerator design variants are introduced, providing different hardware tradeoffs. Regarding intra-accelerator data communication, and unlike other state-of-the-art (SOTA) works, we combine a streaming protocol with task-level parallelization to remove the need to store the public key in accelerator-private memories. Our proposed design shows new record execution times over its SOTA counterparts, ranging on average from 3.5 × up to 7.7 × across the five security level parameter sets. Our end-to-end implementation in a Zynq SoC shows an average speedup of 2.2 × compared to a 64-bit vectorized CM software-baseline. The elevated logic resource consumption, characteristic of HLS designs, can be readily adjusted with a performance tradeoff.
Read full abstract