Abstract

We present implementations of the lattice-based digital signature scheme Dilithium for ARM Cortex-M3 and ARM Cortex-M4. Dilithium is one of the three signature finalists of the NIST post-quantum cryptography competition. As our Cortex-M4 target, we use the popular STM32F407-DISCOVERY development board. Compared to the previous speed records on the Cortex-M4 by Ravi, Gupta, Chattopadhyay, and Bhasin we speed up the key operations NTT and NTT−1 by 20% which together with other optimizations results in speedups of 7%, 15%, and 9% for Dilithium3 key generation, signing, and verification respectively. We also present the first constant-time Dilithium implementation on the Cortex-M3 and use the Arduino Due for benchmarks. For Dilithium3, we achieve on average 2 562 kilocycles for key generation, 10 667 kilocycles for signing, and 2 321 kilocycles for verification.Additionally, we present stack consumption optimizations applying to both our Cortex- M3 and Cortex-M4 implementation. Due to the iterative nature of the Dilithium signing algorithm, there is no optimal way to achieve the best speed and lowest stack consumption at the same time. We present three different strategies for the signing procedure which allow trading more stack and flash memory for faster speed or viceversa. Our implementation of Dilithium3 with the smallest memory footprint uses less than 12kB. As an additional output of this work, we present the first Cortex-M3 implementations of the key-encapsulation schemes NewHope and Kyber.

Highlights

  • In 2016, NIST called for proposals for new post-quantum schemes [NIS16] which are meant to replace the existing standards for key establishment (SP 800-56A [BCR+18] and SP 80056B [BCR+19]) and digital signatures (FIPS 186-4 [Nat13])

  • Dilithium [DKL+18, LDK+19] is a digital signature scheme based on the hardness of the M-LWE and the M-SIS lattice problems

  • Our Cortex-M4 implementation is based on the Dilithium implementation by Ravi, Gupta, Chattopadhyay, and Bhasin [RGCB19], which includes the number theoretic transform (NTT) and NTT−1 assembly implementation of Güneysu, Krausz, Oder, and Speith [GKOS18]

Read more

Summary

Introduction

In 2016, NIST called for proposals for new post-quantum schemes [NIS16] which are meant to replace the existing standards for key establishment (SP 800-56A [BCR+18] and SP 80056B [BCR+19]) and digital signatures (FIPS 186-4 [Nat13]). An implementation of any scheme working on large (secret) integers compiled for the Cortex-M3 is most likely going to leak information about these secret integers via timing side channels. This has been shown to pose a problem for cryptographic schemes in preceding ARM architectures [GOPT09]. Gérard, Tibouchi, and Fouque propose to use a power-of-two modulus instead of the original prime modulus to allow for cheaper masking Strictly speaking, they do not implement the Dilithium scheme as it was submitted to NIST. In Appendix A, we provide performance results for Kyber and NewHope on the Cortex-M3 which are a by-product of this work

Dilithium
Target Platforms
Improving the Performance on Cortex-M4
Fast Constant-Time NTTs on Cortex-M3
SMULL and SMLAL
Cooley–Tukey and Gentleman–Sande Butterflies
Time-Memory Trade-Offs
Strategy 1: A in Flash
Strategy 2: A in SRAM
Strategy 3
Splitting signature generation in an offline and online phase
Results
NTT performance
Cortex-M4 performance
Cortex-M3 performance
Stack usage
Profiling
A Kyber and NewHope on Cortex-M3
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call