Number Theoretic Transform (NTT) is useful for the acceleration of polynomial multiplication, which is the main performance bottleneck in the next-generation cryptographic schemes. Different NTT based cryptographic algorithms have different security settings. The diverse application scenarios introduce different cost-performance trade-offs and hardware constraints. Motivated by the emerging demand for more versatile NTT hardware accelerators, we propose a new design methodology that can generate area-efficient and high-performance NTT accelerators for any length and modulus of NTT polynomials and single processing element (PE) or PE array with a varying number of layers. The proposed NTT accelerator architecture pivots on a conflict-free memory access pattern for adaptation to different combinations of security and PE array configuration parameters. The proposed memory access pattern is formally proved to be conflict-free for any parametric configurations. The criterion for read-after-write conflict without pipeline stall is also established. Our proposed design methodology can produce NTT accelerators with single PE or multi-layer PE array for different polynomial size and modulus, with hardware area and computational efficiency comparable to accelerators customized for a fixed set of parameters. Our proposed methodology produces parameterized accelerator with higher scalability than the existing parameterized accelerator design. On average, the accelerators generated by our proposed method are 71.4% more area-time efficient. Up to 30.7% area-time reduction over the most area-time efficient state-of-the-art scalable NTT accelerator can be achieved for the same security parameters.
Read full abstract