Recently, the construction of cryptographic schemes based on hard lattice problems has gained immense popularity. Apart from being quantum resistant, lattice-based cryptography allows a wide range of variations in the underlying hard problem. As cryptographic schemes can work in different environments under different operational constraints such as memory footprint, silicon area, efficiency, power requirement, and so on, such variations in the underlying hard problem are very useful for designers to construct different cryptographic schemes. In this work, we explore various design choices of lattice-based cryptography and their impact on performance in the real world. In particular, we propose a suite of key-encapsulation mechanisms based on the learning with rounding problem with a focus on improving different performance aspects of lattice-based cryptography. Our suite consists of three schemes. Our first scheme is Florete, which is designed for efficiency. The second scheme is Espada, which is aimed at improving parallelization, flexibility, and memory footprint. The last scheme is Sable, which can be considered an improved version in terms of key sizes and parameters of the Saber key-encapsulation mechanism, one of the finalists in the National Institute of Standards and Technology’s post-quantum standardization procedure. In this work, we have described our design rationale behind each scheme. Furthermore, to demonstrate the justification of our design decisions, we have provided software and hardware implementations. Our results show Florete is faster than most state-of-the-art KEMs on software platforms. For example, the key-generation algorithm of high-security version Florete outperforms the National Institute of Standards and Technology’s standard Kyber by 47%, the Federal Office for Information Security’s standard Frodo by 99%, and Saber by 57% on the ARM Cortex-M4 platform. Similarly, in hardware, Florete outperforms Frodo and NTRU Prime for all KEM operations. The scheme Espada requires less memory and area than the implementation of most state-of-the-art schemes. For example, the encapsulation algorithm of high-security version Espada uses 30% less stack memory than Kyber, 57% less stack memory than Frodo, and 67% less stack memory than Saber on the ARM Cortex-M4 platform. The implementations of Sable maintain a tradeoff between Florete and Espada regarding software performance and memory requirements. Sable outperforms Saber at least by 6% and Frodo by 99%. Through an efficient polynomial multiplier design, which exploits the small secret size, Sable outperforms most state-of-the-art KEMs, including Saber, Frodo, and NTRU Prime. The implementations of Sable that use number theoretic transform-based polynomial multiplication (SableNTT) surpass all the state-of-the-art schemes in performance, which are optimized for speed on the Cortext M4 platform. The performance benefit of SableNTT against Kyber lies in between 7-29%, 2-13% for Saber, and around 99% for Frodo.
Read full abstract