Abstract

The advent of the quantum computer makes current public-key infrastructure insecure. Cryptography community is addressing this problem by designing, efficiently implementing, and evaluating novel public-key algorithms capable of withstanding quantum computational power. Governmental agencies, such as NIST, are promoting standardization of quantum-resistant algorithms that is expected to run for 7 years. Several modern applications must maintain permanent data secrecy; therefore, they ultimately require the use of quantum-resistant algorithms. Because algorithms are still under scrutiny for eventual standardization, the deployment of the hardware implementation of quantum-resistant algorithms is still in early stages. In this article, we propose a methodology to design programmable hardware accelerators for lattice-based algorithms, and we use the proposed methodology to implement flexible and energy efficient post-quantum cache-based accelerators for NewHope , Kyber , Dilithium , Key Consensus from Lattice ( KCL ), and R.EMBLEM submissions to the NIST standardization contest. To the best of our knowledge, we propose the first efficient domain-specific, programmable cache-based accelerators for lattice-based algorithms. We design a single accelerator for a common kernel among various schemes with different kernel sizes, i.e., loop count, and data types. This is in contrast to the traditional approach of designing one special purpose accelerators for each scheme. We validate our methodology by integrating our accelerators into an HLS-based SoC infrastructure based on the X86 processor and evaluate overall performance. Our experiments demonstrate the suitability of the approach and allow us to collect insightful information about the performance bottlenecks and the energy efficiency of the explored algorithms. Our results provide guidelines for hardware designers, highlighting the optimization points to address for achieving the highest energy minimization and performance increase. At the same time, our proposed design allows us to specify and execute new variants of lattice-based schemes with superior energy efficiency compared to the main application processor without changing the hardware acceleration platform. For example, we manage to reduce the energy consumption up to 2.1× and energy-delay product (EDP) up to 5.2× and improve the speedup up to 2.5×.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.