Abstract
Crystals-Dilithium is one of the digital-signature algorithms in NIST’s ongoing post-quantum cryptography (PQC) standardization final round. Security and computational efficiency concerning software and hardware implementations are the primary criteria for PQC standardization. Many studies were conducted to efficiently apply Dilithium in various environments; however, they are focused on traditionally used PC and 32-bit Advanced RISC Machine (ARM) processors (Cortex-M4). ARMv8-based processors are more advanced embedded microcontrollers (MCUs) and have been widely used for various IoT devices, edge computing devices, and On-Board Units in autonomous driving cars. In this study, we present an efficient Crystals-Dilithium implementation on ARMv8-based MCU. To enhance Dilithium’s performance, we optimize number theoretic transform (NTT)-based polynomial multiplication, the core operation of Dilithium, by leveraging ARMv8’s architectural properties such as large register sets and NEON engine. We apply task parallelism to NTT-based polynomial multiplication using the NEON engine. In addition, we reduced the number of memory accesses during NTT-based polynomial multiplication with the proposed merging and register-holding techniques. Finally, we present an interleaved NTT-based multiplication simultaneously executed with ARM processor and NEON engine. This implementation can further optimize performance by eliminating the ARM processor latency with NEON overheads. Through the proposed optimization methods, for Dilithium 3, we achieved a performance improvement of about 43.83% in key pair generation, 113.25% in signing, and 41.92% in verification compared to the reference implementation submitted to the final round of the NIST PQC competition.
Highlights
In the communication network field, sensor nodes and devices use cryptographic protocol with digital-signature and key-exchange algorithms for integrity and confidentiality [1, 2]
We propose the parallel logic of the number theoretic transform (NTT)-based polynomial multiplication algorithm, which is the core operation of Crystals-Dilithium, by fully utilizing the Advanced RISC Machine (ARM) processor and NEON engine
We achieved performance improvements of approximately 43.83%, 113.25%, and 41.92% in KegGen, Sign, and Verify based on CrystalsDilithium security level 3, respectively, using our NTT-based multiplication optimization method
Summary
In the communication network field, sensor nodes and devices use cryptographic protocol with digital-signature and key-exchange algorithms for integrity and confidentiality [1, 2]. As Google developed a 72 q-bit quantum computer, a fatal issue arose for the existing cryptographic system. It is solved in polynomial time in a public-key cryptography security system based on the factorization and discrete logarithm using the Shor algorithm [8] within a quantum environment. Recognizing this issue, NIST held the post-quantum cryptography standardization for key encapsulation mechanism (KEM) and digital signature in 2016 to replace the international standard public-key cryptography. Except for Classic McEliece and Rainbow, all finalists use lattice-based cryptography
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.