Area-time Efficiency Research Articles

SABER is a round 3 candidate in the NIST Post-Quantum Cryptography Standardization process. Polynomial convolution is one of the most computationally intensive operation in Saber Key Encapsulation Mechanism, that can be performed through widely explored algorithms like the schoolbook polynomial multiplication algorithm (SPMA) and Number Theoretic Transform (NTT). While SPMA multiplier has a slow latency performance, the NTT-based multiplier usually requires large hardware. In this work, we propose KaratSaber, an optimized Karatsuba polynomial multiplier architecture with a balanced hardware efficiency (throughput-per-slice, TPS) compared to NTT and SPMA based designs. KaratSaber employs several techniques for an efficient design: a parallel grid input technique for efficient pre-processing stage in Karatsuba-based polynomial multiplier, a novel instruction code result-mapping technique catering the negacyclic operations improves the post-processing stage efficiency, a double multiplicand shifter-based multiplier doubles the throughput at the multiplication stage. Combining these three techniques, the proposed KaratSaber architecture is 7.47 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\times$</tex-math></inline-formula> faster compared to the state-of-the-art SPMA Saber architecture at the expense of 4.96 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\times$</tex-math></inline-formula> additional hardware resources; making KaratSaber 46.04% more area-time efficient. When compared to LWRPro, a recent Karatsuba Saber architecture, KaratSaber architecture achieves a 2.11 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\times$</tex-math></inline-formula> higher throughput by only utilizing 1.92 <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$\times$</tex-math></inline-formula> additional hardware; thus gaining a 10.44% improvement in area-time efficiency

The modular multiplication operation is the most time-consuming operation for number-theoretic cryptographic algorithms involving large integers, such as RSA and Diffie-Hellman. Implementations reveal that more than 75 percent of the time is spent in the modular multiplication function within the RSA for more than 1,024-bit moduli. There are fast multiplier architectures to minimize the delay and increase the throughput using parallelism and pipelining. However such designs are large in terms of area and low in efficiency. In this paper, we integrate the fast Fourier transform (FFT) method into the McLaughlin’s framework, and present an improved FFT-based Montgomery modular multiplication (MMM) algorithm achieving high area-time efficiency. Compared to the previous FFT-based designs, we inhibit the zero-padding operation by computing the modular multiplication steps directly using cyclic and nega-cyclic convolutions. Thus, we reduce the convolution length by half. Furthermore, supported by the number-theoretic weighted transform, the FFT algorithm is used to provide fast convolution computation. We also introduce a general method for efficient parameter selection for the proposed algorithm. Architectures with single and double butterfly structures are designed obtaining low area-latency solutions, which we implemented on Xilinx Virtex-6 FPGAs. The results show that our work offers a better area-latency efficiency compared to the state-of-the-art FFT-based MMM architectures from and above 1,024-bit operand sizes. We have obtained area-latency efficiency improvements up to 50.9 percent for 1,024-bit, 41.9 percent for 2,048-bit, 37.8 percent for 4,096-bit and 103.2 percent for 7,680-bit operands. Furthermore, the operating latency is also outperformed with high clock frequency for length-64 transform and above.

Area-time Efficiency Research Articles

Related Topics

Articles published on Area-time Efficiency

KaratSaber: New Speed Records for Saber Polynomial Multiplication using Efficient Karatsuba FPGA Architecture

An area-time efficient point-multiplication architecture for ECC over GF(2m) using polynomial basis

High-Speed RLWE-Oriented Polynomial Multiplier Utilizing Karatsuba Algorithm

Area–Time-Efficient Code-Based Postquantum Key Encapsulation Mechanism on FPGA

An area-efficient ECC architecture over GF([formula omitted]) for resource-constrained applications

Novel Bit-Parallel and Digit-Serial Systolic Finite Field Multipliers Over $GF(2^m)$ Based on Reordered Normal Basis

Highly efficient $$\textit{GF}(2^8)$$ GF ( 2 8 ) inversion circuit based on hybrid GF representations

FFT-based McLaughlin's Montgomery Exponentiation without Conditional Selections

Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication

An Area Time-Efficient Structure to Find the Approximate First Two Minima for Min-Sum-Based LDPC Decoders

One Minimum Only Trellis Decoder for Non-Binary Low-Density Parity-Check Codes

Baby Corn-Legumes Intercropping Systems: I. Yields, Resource Utilization Efficiency, and Soil Health

EXTENDED COMPATIBILITY PATH BASED HARDWARE BINDING: AN ADAPTIVE ALGORITHM FOR HIGH LEVEL SYNTHESIS OF AREA-TIME EFFICIENT DESIGNS

A Novel High-Speed 54Ã—54 bit Multiplier

Unified Systolic-Like Architecture for DCT and DST Using Distributed Arithmetic

Efficient Software-Implementation of Finite Fields with Applications to Cryptography

Efficiency Analysis for a Mixed-Signal Focal Plane Processing Architecture

Self-timed carry-lookahead adders

Regular, area-time efficient carry-lookahead adders

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Area-time Efficiency Research Articles

Related Topics

Articles published on Area-time Efficiency

KaratSaber: New Speed Records for Saber Polynomial Multiplication using Efficient Karatsuba FPGA Architecture

An area-time efficient point-multiplication architecture for ECC over GF(2m) using polynomial basis

High-Speed RLWE-Oriented Polynomial Multiplier Utilizing Karatsuba Algorithm

Area–Time-Efficient Code-Based Postquantum Key Encapsulation Mechanism on FPGA

An area-efficient ECC architecture over GF([formula omitted]) for resource-constrained applications

Novel Bit-Parallel and Digit-Serial Systolic Finite Field Multipliers Over $GF(2^m)$ Based on Reordered Normal Basis

Highly efficient $$\textit{GF}(2^8)$$ GF ( 2 8 ) inversion circuit based on hybrid GF representations

FFT-based McLaughlin's Montgomery Exponentiation without Conditional Selections

Area-Time Efficient Architecture of FFT-Based Montgomery Multiplication

An Area Time-Efficient Structure to Find the Approximate First Two Minima for Min-Sum-Based LDPC Decoders

One Minimum Only Trellis Decoder for Non-Binary Low-Density Parity-Check Codes

Baby Corn-Legumes Intercropping Systems: I. Yields, Resource Utilization Efficiency, and Soil Health

EXTENDED COMPATIBILITY PATH BASED HARDWARE BINDING: AN ADAPTIVE ALGORITHM FOR HIGH LEVEL SYNTHESIS OF AREA-TIME EFFICIENT DESIGNS

A Novel High-Speed 54Ã—54 bit Multiplier

Unified Systolic-Like Architecture for DCT and DST Using Distributed Arithmetic

Efficient Software-Implementation of Finite Fields with Applications to Cryptography

Efficiency Analysis for a Mixed-Signal Focal Plane Processing Architecture

Self-timed carry-lookahead adders

Regular, area-time efficient carry-lookahead adders