Abstract
We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.
Highlights
The development of an efficient quantum order-finding algorithm by Shor [Sho97] invalidated the quantum hardness of factoring and discrete logarithms in Abelian groups
In this work we have shown that fast post-quantum cryptography is feasible on current smart card platforms
On a commercially available device it is possible to obtain a significant speedup of the arithmetic of lattice-based cryptography by reusing already existing co-processors dedicated to the acceleration of RSA or ECC
Summary
The development of an efficient quantum order-finding algorithm by Shor [Sho97] invalidated the quantum hardness of factoring and discrete logarithms in Abelian groups. In the smart-card setting, low-power general purpose 16 or 32-bit CPUs are commonly augmented by cryptographic co-processors capable of executing Diffie-Hellman key exchanges, encryptions or signatures based on RSA or elliptic curves. We repurpose existing cryptographic co-processors to accelerate lattice-based cryptography For this we make use of variants of Kronecker substitution combined with low-degree polynomial arithmetic. Microcontrollers and embedded processors usually have only very limited amount of available RAM, space to store program code and operate with relatively simple 8-, 16-, or 32-bit processor architectures They are sometimes referred to as constrained devices and are mostly used in embedded applications where low energy consumption, reduced device costs, and other aspects like real-time capabilities are required. What has received comparably less attention in the literature so far are flexible cryptographic co-processors for lattice-based cryptography in the spirit of RSA or ECC co-processors (cf. [SBPV07]) and instruction set extensions (cf. a multiply-accumulate instruction [Wen13])
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have