Abstract

Lattice-based cryptography performs polynomial multiplication using the Number Theoretic Transform (NTT), in order to reduce the polynomial multiplication complexity from $O\left(n^{2}\right)$ to $O(n \log n)$. NTT has been in the center of investigation in cryptography space, as it is applied in many cryptography schemes such as hash functions, homomorphic encryption, key-encapsulation mechanisms, and digital signatures. A common approach for rapid production of hardware designs commences from semi-automatic software production, as supported by the Xilinx High-Level Synthesis (HLS) toolchain or similar tools. Most of the times this approach requires careful modifications (e.g. code modification, loop reordering, loop flattening, removing dependencies, loop pipelining, loop unrolling) in order to achieve a design with performance comparable to a Register-Transfer Level (RTL) hand-crafted design. In this paper a design solution is proposed that solves the data and loop-carry dependencies of the Cooley-Tukey NTT algorithm, by assisting the HLS synthesizer to produce efficient designs, in terms of latency and resources. The proposed work has been evaluated using the Dilithium digital-signature scheme NTT version ($n=256, Q$ of 23 bits), and is shown to achieve a 20-50 % improvement in terms of latency (without really affecting the resources) compared to other existing HLS-based NTT solutions in the literature.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.