Abstract

We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.

Highlights

  • The development of an efficient quantum order-finding algorithm by Shor [Sho97] invalidated the quantum hardness of factoring and discrete logarithms in Abelian groups

  • In this work we have shown that fast post-quantum cryptography is feasible on current smart card platforms

  • On a commercially available device it is possible to obtain a significant speedup of the arithmetic of lattice-based cryptography by reusing already existing co-processors dedicated to the acceleration of RSA or ECC

Read more

Summary

Introduction

The development of an efficient quantum order-finding algorithm by Shor [Sho97] invalidated the quantum hardness of factoring and discrete logarithms in Abelian groups. In the smart-card setting, low-power general purpose 16 or 32-bit CPUs are commonly augmented by cryptographic co-processors capable of executing Diffie-Hellman key exchanges, encryptions or signatures based on RSA or elliptic curves. We repurpose existing cryptographic co-processors to accelerate lattice-based cryptography For this we make use of variants of Kronecker substitution combined with low-degree polynomial arithmetic. Microcontrollers and embedded processors usually have only very limited amount of available RAM, space to store program code and operate with relatively simple 8-, 16-, or 32-bit processor architectures They are sometimes referred to as constrained devices and are mostly used in embedded applications where low energy consumption, reduced device costs, and other aspects like real-time capabilities are required. What has received comparably less attention in the literature so far are flexible cryptographic co-processors for lattice-based cryptography in the spirit of RSA or ECC co-processors (cf. [SBPV07]) and instruction set extensions (cf. a multiply-accumulate instruction [Wen13])

Preliminaries
Hard problems
Target platform
Kronecker
Compact Kronecker
Splitting the ring
Implementation
Description of Kyber using Kronecker
Implementation of Kyber on SLE 78
Realisation of KyberMulAdd with KS1
Realisation of KyberMulAdd with KS2
Realisation of MulAdd for other RLWE-based schemes
Implementation performance
Comparison with related work
Conclusion and future work
A Cyclotomic gadgets
B Proof of Concept
Findings
We add a number of size eta in absolute value
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call