Abstract

This paper proposes two different methods to perform NTT-based polynomial multiplication in polynomial rings that do not naturally support such a multiplication. We demonstrate these methods on the NTRU Prime key-encapsulation mechanism (KEM) proposed by Bernstein, Chuengsatiansup, Lange, and Vredendaal, which uses a polynomial ring that is, by design, not amenable to use with NTT. One of our approaches is using Good’s trick and focuses on speed and supporting more than one parameter set with a single implementation. The other approach is using a mixed radix NTT and focuses on the use of smaller multipliers and less memory. On a ARM Cortex-M4 microcontroller, we show that our three NTT-based implementations, one based on Good’s trick and two mixed radix NTTs, provide between 32% and 17% faster polynomial multiplication. For the parameter-set ntrulpr761, this results in between 16% and 9% faster total operations (sum of key generation, encapsulation, and decapsulation) and requires between 15% and 39% less memory than the current state-of-the-art NTRU Prime implementation on this platform, which is using Toom-Cook-based polynomial multiplication.

Highlights

  • Due to the ongoing advances in quantum computing, the threat by quantum computers to IT-security becomes more and more imminent: Experts predict that sufficiently large and stable quantum computers running Shor’s algorithm for factorization and solving discrete logarithms may be able to break currently wide-spread asymmetric cryptographic primitives in the ten to fifteen years

  • In the research field Post-Quantum Cryptography (PQC), researchers haven been investigating alternative cryptographic schemes that are believed to be secure against attacks aided by quantum computers

  • PQC-primitives based on lattice problems have attracted significant attention due to their efficient implementations that often are on par with or even better than current cryptographic schemes

Read more

Summary

Introduction

Due to the ongoing advances in quantum computing, the threat by quantum computers to IT-security becomes more and more imminent: Experts predict that sufficiently large and stable quantum computers running Shor’s algorithm for factorization and solving discrete logarithms may be able to break currently wide-spread asymmetric cryptographic primitives in the ten to fifteen years. In contrast to other lattice-based schemes, which commonly use either cyclotomic polynomials to enable the use of an NTT or power-of-two moduli for efficient coefficientwise operations, it is a challenging task to implement NTRU Prime efficiently due to its specific design This challenge becomes even bigger on embedded devices with low resources in regard to computational power and memory. We implement two approaches of polynomial multiplication for NTRU Prime based on NTT on the Cortex-M4 architecture and show that our approaches are faster and more memory efficient than the current state-of-the-art. The current state-of-the-art implementation of NTRU Prime on Cortex-M4 is the work by Yang et al. that was included as optimized implementation for NTRU Prime in the pqm project in April 20203 It is using Toom-Cook for polynomial multiplication, fast modular inversion from [BY19], and platform-specific, hand-written assembly optimization.

NTRU Prime
Streamlined NTRU Prime
NTRU LPRime
Number Theoretic Transform
Rader’s Trick
Good’s Trick
Approaches
Implementation
Implementation of Mixed-Radix NTT Multiplication
Evaluation
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call