Corrections to “Efficient Fault-Detection Architectures for Barrett Reduction and Multiplication in Classical and Post-Quantum Cryptographic Systems”
Corrections to “Efficient Fault-Detection Architectures for Barrett Reduction and Multiplication in Classical and Post-Quantum Cryptographic Systems”
- Conference Article
54
- 10.1109/fpl.2005.1515780
- Oct 10, 2005
The computational fundament of most public-key cryptosystems is the modular multiplication. Improving the efficiency of the modular multiplication is directly associated with the efficiency of the whole cryptosystem. This paper presents an implementation and comparison of three recently proposed, highly efficient architectures for modular multiplication on FPGAs: interleaved modular multiplication and two variants of the Montgomery modular multiplication. This (first) hardware implementation of these designs shows their relative performance regarding area and speed. One of the main findings is that the interleaved multiplication has the least area time product of all investigated architectures. As a typical cryptographic application, we show that a 1024-bit RSA exponentiation can be performed in less than 6.1ms at a clock rate of 69MHz on a Xilinx Virtex FPGA.
- Research Article
2
- 10.1016/j.microrel.2023.115277
- Nov 17, 2023
- Microelectronics Reliability
Performance optimized approximate multiplier architecture ST-AxM - based on statistical analysis and static compensation
- Conference Article
3
- 10.1145/2858930.2858935
- Jan 20, 2016
Efficient implementation of double point multiplication is crucial for elliptic curve cryptographic systems. We propose efficient algorithms and architectures for the computation of double point multiplication on binary elliptic curves and provide a comparative analysis of their performance for 112-bit security level. To the best of our knowledge, this is the first work in the literature which considers the design and implementation of simultaneous computation of double point multiplication. We first provide algorithmics for the three main double point multiplication methods. Then, we perform data-flow analysis and propose hardware architectures for the presented algorithms. Finally, we implement the proposed state-of-the-art architectures on FPGA platform for the comparison purposes and report the area and timing results. Our results indicate that differential addition chain based algorithms are better suited to compute double point multiplication over binary elliptic curves for high performance applications.
- Conference Article
- 10.1145/3056662.3056676
- Feb 26, 2017
Finite field arithmetic has been extensively used in error correcting codes and cryptography. Among the arithmetic operations of finite fields, multiplication is important building block of various applications. This is because the time-consuming operations such as exponentiation, division, and multiplicative inversion can be decomposed into repeated AB or AB2 multiplications. Therefore, we require an efficient algorithm and architecture for multiplication over finite fields. In this paper, we propose an efficient Montgomery AB2 multiplier over finite fields defined by irreducible all-one polynomials. The proposed AB2 multiplier has less space and time complexities compared to related multipliers. As compared to the corresponding existing structures, the proposed AB2 multiplier saves at least 59% area, 50%time, and 79% area-time (AT) complexity. Accordingly, it is well suited for VLSI implementation and can be easily applied as a basic component for computing complex operations over finite field.
- Conference Article
11
- 10.1109/spin.2017.8049986
- Feb 1, 2017
Logarithmic Number System (LNS) based multiplier plays a significant role in the fields of Digital Signal Processing (DSP), Image processing and Neural network which needs a lot of arithmetic operation. In all arithmetic operations, the multiplication is most hardware consuming component. Here, we give a possible solution to that problem by using an efficient VLSI architecture of Mitchell's algorithm based iterative logarithmic multiplier with seamless pipelined technique. The proposed work is based on the hardware minimization at the same error cost than of previously reported architectures. We use VHDL to design the existing and proposed Mitchell's algorithm based iterative logarithmic multiplier. Both multipliers design are evaluated with the Synopsys design compiler by using 90 nm CMOS technology and compared the results in terms of Data Arrival Time (DAT), Area, Power, Area Delay Product (ADP), and EPS (Energy per Sample). The proposed design involves 30.99 %, 31.10 %, and 20.84 % ADP, 5.12 %, 15.48%, and 23.55 % less EPS in comparisons of existing Mitchell's algorithm based iterative logarithmic multiplier for 8 bit, 16 bit, and 32 bit operations respectively.
- Research Article
8
- 10.3390/electronics9122126
- Dec 12, 2020
- Electronics
This work presents an efficient high-speed hardware architecture for point multiplication (PM) computation of Elliptic-curve cryptography using binary fields over GF(2163) and GF(2571). The efficiency is achieved by reducing: (1) the time required for one PM computation and (2) the total number of required clock cycles. The required computational time for one PM computation is reduced by incorporating two modular multipliers (connected in parallel), a serially connected adder after multipliers and two serially connected squarer units (one after the first multiplier and another after the adder). To optimize the total number of required clock cycles, the point addition and point double instructions for PM computation of the Montgomery algorithm are re-structured. The implementation results after place-and-route over GF(2163) and GF(2571) on a Xilinx Virtex-7 FPGA device reveal that the proposed high-speed architecture is well-suited for the network-related applications, where millions of heterogeneous devices want to connect with the unsecured internet to reach an acceptable performance.
- Research Article
5
- 10.14419/ijet.v7i2.16.11410
- Apr 12, 2018
- International Journal of Engineering & Technology
Multiplication is one of important arithmetic component for digital signal processing, neural network and image processing. But, it is well known fact that multiplier has most hardware consuming component out of all arithmetic components. Here, it is given a possible solution by using an efficient VLSI architecture of Mitchell’s algorithm based Iterative Logarithmic Multiplier (ILM) with modified architecture of Leading One Detector (LOD) and seamless pipelined technique. The proposed work is based on the hardware minimization at the same error cost than of previously reported architectures. We use VHDL to design the existing and proposed Mitchell’s algorithm based iterative logarithmic multiplier. Both multipliers design are evaluated with the Synopsys design compiler by using 90 nm CMOS technology and compared the results in terms of Data Arrival Time (DAT), area, power, Area Delay Product (ADP) and energy. The proposed Mitchell's based ILM gives 33.18 %, 39.03 % and 31.62 % less ADP, 25.08 %, 38.08 % and 46.72 % less energy for 8, 16, and 32 bits architecture respectively in comparison of the reported ILM. The importance of LODs and seamless pipeline has been shown in an efficient architecture of Mitchell's based ILM.
- Research Article
70
- 10.1016/s0026-2692(03)00172-1
- Jul 16, 2003
- Microelectronics Journal
An efficient reconfigurable multiplier architecture for Galois field GF(2m)
- Research Article
- 10.25728/assa.2018.18.4.617
- Dec 28, 2018
- Advances in systems science and applications
The tremendous increase in the use of portable electronic devices is due to the development in the fields of signal processing and electronic technology. These battery operated devices needs reduction in power consumption with increased performance and long battery life. Since CMOS technology scaling fast approaches its physical limit of minimum supply voltage and smaller feature size, the hardware designer has to opt for new multiplier architectures for achieving low power and high speed performance. This paper proposes an area and power efficient approximate multiplier architecture. The error metrics are estimated to verify its performance advantage over other approximate multipliers. Using Frequency Response masking approach, a 6-band non-uniform digital FIR filter bank is developed using approximate multiplier for hearing aid application. Audiogram matching is done with audiograms of two different types of hearing losses and the matching error is computed. Simulation results show that the audiogram matching error falls within +/- 4 dB range.
- Conference Article
5
- 10.1109/ic3a48958.2020.233291
- Feb 1, 2020
Multiplication is an ubiquitous operation in growing set of media processing applications (graphics, audio, video, and image). Many of these applications, however, possess an inherent quality of error resilience. Thus the multipliers, that are not very precise but return an approximate value, can be utilized in such applications. Such units, it may be anticipated, may result in area savings while also resulting in reduced power consumption. In recent years, logarithmic number system (LNS) has been increasingly used as an alternative to the binary number system as it converts multiplication to addition resulting in simplified hardware. However, they suffer from inherent error and any efforts in improving their accuracy would help find their increased usage in arithmetic computations with efficient hardware. In this paper, a method that combines Mitchell’s approximation with a hardware pruning technique that leads to an area efficient multiplier architecture without compromise on precision. Simulations carried out prove that the proposed multiplier architectures are efficient both in area and delay when compared to existing designs.
- Research Article
2
- 10.1007/s00034-019-01137-7
- May 15, 2019
- Circuits, Systems, and Signal Processing
Hardware multiplier circuits decide the speed and power consumption in the execution of digital signal processing algorithms. The desirable feature of reduced area and power consumption for battery-driven multimedia gadgets can be realized by replacing the power hungry multiplier circuits with approximate multiplier circuits. The approximation techniques reduce the complexity of the design and improve the energy efficiency of the circuit. This paper proposes an area and power efficient approximate unsigned integer multiplier architecture based on wordlength reduction. It is designed to meet a pre-specified error performance with improved area and power reduction compared with similar designs. It is extended further for the signed multiplier architecture. The circuit characteristics are analyzed to establish the suitability of the proposed design for low-power applications. Synthesis results show that the proposed unsigned multiplier consumes 65% less power than the exact Wallace multiplier. The area requirement of the proposed multiplier reduces by 50% compared to an exact multiplier. The multiplier is tested for image filtering to establish the efficacy of the design in multimedia applications.
- Conference Article
3
- 10.1109/mwscas.2006.382305
- Aug 1, 2006
A comprehensive study of spurious activity propagation, based on transistor-level simulations targeting a 0.18μm CMOS process, is carried out in traditional multiplier architectures (Carry-Save, Carry-Save with Booth receding and Wallace tree). The results suggest to implement self-timed multipliers, i.e. multipliers in which partial products are triggered by an independent delay line: they have the property of suppressing unnecessary switching activity. They are discussed in terms of area occupation and, especially, power dissipation and Energy-Delay-Product (EDP). After that, a new self-timed multiplier architecture is introduced. Transistor-level simulations point out a dissipation of 2.0μW/MHz against 4.8μW/MHz of a recently published self-timed multiplier and 4.1μW/MHz of the most efficient traditional architecture (Wallace), with a reduced 5% area overhead compared to the latter one.
- Research Article
75
- 10.1109/tc.2011.125
- Aug 1, 2012
- IEEE Transactions on Computers
Since its acceptance as the adopted symmetric-key algorithm, the Advanced Encryption Standard (AES) and its recently standardized authentication Galois/Counter Mode (GCM) have been utilized in various security-constrained applications. Many of the AES-GCM applications are power and resource constrained and require efficient hardware implementations. In this paper, different application-specific integrated circuit (ASIC) architectures of building blocks of the AES-GCM algorithms are evaluated and optimized to identify the high-performance and low-power architectures for the AES-GCM. For the AES, we evaluate the performance of more than 40 S-boxes utilizing a fixed benchmark platform in 65-nm CMOS technology. To obtain the least complexity S-box, the formulations for the Galois Field (GF) subfield inversions in GF(2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">4</sup> ) are optimized. By conducting exhaustive simulations for the input transitions, we analyze the average and peak power consumptions of the AES S-boxes considering the switching activities, gate-level netlists, and parasitic information. Additionally, we present high-speed, parallel hardware architectures for reaching low-latency and high-throughput structures of the GCM. Finally, by investigating the high-performance GF(2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">128</sup> ) multiplier architectures, we benchmark the proposed AES-GCM architectures using quadratic and subquadratic hardware complexity GF(2 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">128</sup> ) multipliers. It is shown that the performance of the presented AES-GCM architectures outperforms the previously reported ones in the utilized 65-nm CMOS technology.
- Conference Article
- 10.1109/scopes.2016.7955716
- Oct 1, 2016
This paper proposes design of an efficient constant multiplier architecture using carry select adders. The algorithms proposed earlier to implement this MCM for an efficient FIR filter design can be classified in two main groups graph based algorithms and common subexpression elimination algorithms (CSE). CSE algorithm uses binary representation of coefficients for the implementation of higher order FIR filter with a fewer variety of adders than Canonic Signed Digit (CSD)-based CSE methods. According to the VHBCSE Algorithm, initially 2-bit binary common sub-expression elimination algorithm has been applied vertically across adjacent coefficients on the 2-D space of the coefficient matrix followed by applying 4-bit and 8-bit BCSE algorithm horizontally within each coefficient. Thus there is reduced power consumption by minimum switching activity along with an improvement in the area and delay. The partial products generated by VHBCSE methodology and controlled additions are used by any efficient carry select adder(CSLA) to produce output efficiently instead of earlier ripple carry adder to reduce area and delay.
- Conference Article
2
- 10.1109/icrtit.2014.6996204
- Apr 1, 2014
Although a number of efficient and high-level design algorithms have been put forward for the realization of FIR filter using the least number of arithmetic operations, but they do not take into account the low-level implementation issues which can exactly make a difference to the area and delay in designing of FIR filter. In this paper, at first, we have presented the delay efficient addition and multiplication architectures that are used in designing of the filter operation. Here We have used an algorithm for the multiplication that reduces the bit width and then an efficient parallel adder is been used that implements the two form of FIR filter with very less amount of delay considering the cost of each operation too. The paper presents two different types of FIR filter with 8 and 16 tap among which one of the form is good for the speed and the other is good for the area.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.