This paper presents a full hardware implementation of the eXtended Merkle Signature Scheme (XMSS), a NIST approved and IETF RFC specified post-quantum cryptography (PQC) algorithm. An optimized node traversal is proposed to enable efficient memory utilization without compromising the computational latency of the L-tree and Merkle tree construction, which are two key components used for the compression of the Winternitz One-Time Signature (WOTS) public key in XMSS. The computation of the authentication path during signature generation has also been significantly sped up by our proposed hardware implementation of the Buchmann, Dahmen, and Schneider (BDS) algorithm. Our implementation has completely avoided the use of block random-access memory, which is known to be vulnerable to side-channel attacks. The memory requirement has been highly optimized for implementation with small flip-flop chains and register counters as pointers for fast data access. To the best of our knowledge, this is the first full hardware implementation of all three <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">key generation</i> , <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">signing</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">verification</i> operations of XMSS. The design has been prototyped and evaluated on a 28 nm FPGA platform to demonstrate its performance improvements over the most efficient software and hardware/software co-design methods reported to date. Specifically, it increases the computational efficiency of the best reported XMSS implementation for <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">key generation</i> and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">signature generation</i> by about 20% and 50%, respectively. It can also run at 10% higher clock speed than the fastest hardware implementation of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">signature verification</i> in FPGA with 8% lower hardware resource utilization.
Read full abstract