Abstract
With the recent advances in quantum computing, code-based cryptography is foreseen to be one of the few mathematical solutions to design quantum resistant public-key cryptosystems. The binary polynomial multiplication dominates the computational time of the primitives in such cryptosystems, thus the design of efficient multipliers is crucial to optimize the performance of post-quantum public-key cryptographic solutions. This manuscript presents a flexible template architecture for the hardware implementation of large binary polynomial multipliers. The architecture combines the iterative application of the Karatsuba algorithm, to minimize the number of required partial products, with the Comba algorithm, used to optimize the schedule of their computations. In particular, the proposed multiplier architecture supports operands in the order of dozens of thousands of bits, and it offers a wide range of performance-resources trade-offs that is made independent from the size of the input operands. To demonstrate the effectiveness of our solution, we employed the nine configurations of the LEDAcrypt public-key cryptosystem as representative use cases for large-degree binary polynomial multiplications. For each configuration we showed that our template architecture can deliver a performance-optimized multiplier implementation for each FPGA of the Xilinx Artix-7 mid-range family. The experimental validation performed by implementing our multiplier for all the LEDAcrypt configurations on the Artix-7 12 and 200 FPGAs, i.e., the smallest and the largest devices of the Artix-7 family, demonstrated an average performance gain of 3.6x and 33.3x with respect to an optimized software implementation employing the gf2x C library.
Highlights
The confidentiality provided by widely adopted public-key cryptosystems relies on the hardness of factoring large integers or computing discrete logarithms in a cyclic group
The configurable number of nested Karatsuba iterations allows to control the reduction of the number of partial products to be computed
It is worth noticing that for each considered FPGA we only reported the best multiplier configuration, i.e., the feasible one providing the best performance in terms of the time to compute the entire multiplication
Summary
The confidentiality provided by widely adopted public-key cryptosystems relies on the hardness of factoring large integers or computing discrete logarithms in a cyclic group. Code-based cryptography leverages the hardness of decoding a syndrome obtained with a random linear block code and that of finding a fixed weight code word in the said code [2] Such problems belong to the NP-complete class and it is widely assumed that they cannot have a polynomial time solution even on quantum computers. McEliece proposed the first cryptosystem relying on the hardness of the decoding problem [3], while several following proposals aimed at finding code families with efficient state-space representation to reduce the size of the cryptographic key-pairs. The field’s addition operation corresponds to the logical XOR operation, while the multiplication operation corresponds to the logical AND operation
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.