Articles published on Approximate Multiplier
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
352 Search results
Sort by Recency
- Research Article
- 10.55041/isjem.acme144
- Apr 12, 2026
- International Scientific Journal of Engineering and Management
- Deepthi Chittibona + 3 more
This paper presents a 16-bit Reconfigurable Approximate Multiplier (ReM) architecture designed for energy efficient neuromorphic computing applications. The proposed design integrates dynamic precision scaling and lightweight redundancy to achieve improved power–area efficiency while maintaining acceptable computational accuracy. Multiple precision modes enable adaptive operation based on workload requirements, allowing the multiplier to balance energy consumption and performance dynamically. A Precision Control Unit (PCU) regulates approximation levels, while a Reduced-Precision Redundancy mechanism enhances reliability with minimal hardware overhead. The architecture is implemented and validated on FPGA using Xilinx Vivado to evaluate delay, power consumption, and resource utilization. Experimental results demonstrate that the proposed design significantly reduces overall on-chip power consumption while maintaining stable performance, with only a moderate increase in logic resource utilization. Behavioral simulation confirms correct functional operation under different precision modes. The modular and scalable structure of the proposed design makes it suitable for Spiking Neural Networks (SNNs) and other energy-constrained edge AI applications, offering an effective trade-off between efficiency, flexibility, and reliability. KeyWords—Neuromorphic Architecture, Spiking Neural Networks(SNNs), Approximate Arithmetic Units, Recofigurable Hardware
- Research Article
- 10.59256/ijsreat.20260602020
- Apr 10, 2026
- International Journal Of Scientific Research In Engineering & Technology
- Athiradh Mutyala + 2 more
Approximate Computing has become a popular technique for designing low-power, area-efficient digital circuits for use in error-tolerant applications, such as image and signal processing and machine learning. Multipliers are one of the main building blocks of arithmetic components and contribute greatly to both overall power consumption and silicon area. In this work, we describe the design and implementation of an efficient (in terms of area and power) approximate multiplier by inserting an approximate of the partial product generation stage. More specifically, some of the AND gates of the proposed architecture are replaced with OR gates, with most significant reduction occurring on the least significant bits positions, thus reducing the number of transistors, switching activity and logic complexity but maintaining the accuracy of the most significant bits. The proposed approximate multiplier was implemented and validated at the transistor level. Functionality verification was conducted through simulation. A comparison of the performance of the approximate multiplier to the performance of a traditional exact multiplier shows that using the technique described results in a large reduction in power consumption and hardware complexity while sustaining an acceptable level of computational accuracy. Overall, the results of this work show that there is a large trade-off between accuracy and efficiency in using approximate multipliers for error-tolerant and low-power VLSI applications.
- Research Article
- 10.1016/j.sysarc.2026.103686
- Apr 1, 2026
- Journal of Systems Architecture
- Ruiqi Chen + 7 more
FP8ApproxLib: An FPGA-based approximate multiplier library for 8-bit floating point
- Research Article
- 10.1142/s0218126626501938
- Mar 18, 2026
- Journal of Circuits, Systems and Computers
- Sanjiv Kumar Gupta + 3 more
Approximate computing plays a significant role in the design of energy-efficient architectures for error-tolerant systems. This paper proposes a 4-2 approximate compressor using an improved design of a 3-2 approximate compressor with the help of AND-OR recoding. An architecture of the approximate Dadda multiplier is presented using the proposed 4-2 approximate compressor. Synthesis results show improvements in terms of power delay product (PDP), energy-delay product (EDP), and area delay product (ADP) as compared to the exact multiplier as well as the existing approximate multipliers. The proposed approximate multiplier achieves significant reductions in PDP, EDP, and ADP compared to exact multipliers, with decreases of 61%, 71%, and 64%, respectively. The real-life application of the proposed approximate multiplier is illustrated with the help of image blending and smoothing. The efficacy of the proposed multiplier is validated via the Figure of merit (FOM) and it is found that the proposed multiplier provides better results than the previously reported approximate multipliers.
- Research Article
- 10.55041/isjem05693
- Mar 16, 2026
- International Scientific Journal of Engineering and Management
- Panchakarla Viswaja + 3 more
Edge computing platforms require energy-efficient arithmetic units to handle real-time, data-intensive workloads under strict power and area constraints. Conventional multipliers consume considerable power and hardware resources, making them less suitable for resourcelimited edge devices. This work proposes a low-power approximate multiplier architecture that employs an optimized 5:2 approximate compressor to improve partial product reduction efficiency. By reducing the number of reduction stages and switching activity, the design achieves lower dynamic power consumption and improved power delay product (PDP) while maintaining acceptable accuracy for error-tolerant applications. System-level validation using MATLAB-based image processing demonstrates that the propoed multiplier is well suited for signal processing, image processing, and deep neural network workloads inresource-constrained edge environments.
- Research Article
- 10.1109/tcsi.2025.3625509
- Mar 1, 2026
- IEEE Transactions on Circuits and Systems I: Regular Papers
- Elham Esmaeili + 3 more
The Booth multiplier provides high-performance signed multiplication by encoding and decreasing partial products (PPs) generated using the radix-4 Booth algorithm. Although the radix-8 produces fewer PPs than the radix-4 and needs fewer adders to accumulate PPs, it is not fast because the odd multiples of the multiplicand are generated in a complex unit, and attaining a high performance is challenging. This work alleviates this issue using approximate designs. An approximate 4:2 compressor is proposed in which the inputs are encoded by the generation and propagation method for the reduction of faulty rows in the truth table. The compressor, radix-8 Booth encoder, and PP generation (PPG) are used to attain a signed <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$16\times 16$</tex-math> </inline-formula>-bit, approximate multiplier, and synthesized targeting a 90 nm complementary metal oxide semiconductor (CMOS) technology. The multiplier is efficiently implemented on field programmable gate arrays (FPGAs) to perform the Sobel operator for edge detection. The occupied area, dynamic power dissipation, and power-delay-product (PDP)<inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\times $</tex-math> </inline-formula> mean relative error distance (MRED) of the presented multiplier are superior to the lookup table (LUT)-based multipliers of an FPGA. The Sobel edge detection algorithm implemented on the FPGA detects 99.15% of edges with 33.33% energy savings, while the structural similarity index measure (SSIM) and peak signal-to-noise ratio (PSNR) are 0.88 and 32.92dB, respectively.
- Research Article
- 10.1016/j.comcom.2026.108457
- Mar 1, 2026
- Computer Communications
- Mohamad A Alawad + 3 more
Adaptive energy-aware approximate multiplier with dynamic reconfiguration for IoT edge applications
- Research Article
- 10.1016/j.vlsi.2025.102620
- Mar 1, 2026
- Integration
- Yiqi Zhou + 5 more
Error expectation-driven design and energy optimization of approximate multipliers
- Research Article
- 10.1016/j.future.2025.108220
- Mar 1, 2026
- Future Generation Computer Systems
- Mohammad Javad Askarizadeh + 5 more
Robust DCNN: The impact of approximate multipliers in defending against adversarial attacks
- Research Article
- 10.1016/j.rineng.2025.108934
- Mar 1, 2026
- Results in Engineering
- Arun Kolukulapally + 2 more
CNTFET-based hybrid approximate multiplier design with optimized compressors for error-resilient image processing and enhanced power-delay efficiency
- Research Article
- 10.1038/s41598-026-40524-4
- Feb 21, 2026
- Scientific reports
- Jaiza Hassan + 3 more
Multiplication is a fundamental mathematical operation that finds extensive applications across various disciplines, particularly in computation-intensive and error-resilient applications, such as image processing. As hardware circuits become more complex, there is a growing demand for approximation circuit methods. Implementation of approximate multipliers has the potential to yield substantial reductions in hardware costs while maintaining acceptable performance levels. Most current designs for approximate multipliers are optimized for ASIC-based circuits, which may not produce similar performance improvements when adapted for FPGA-based circuits. Additionally, many of these existing multiplier designs are limited to unsigned numbers. This paper proposes a novel approach for designing signed approximate multipliers tailored specifically for FPGAs. Two efficient architectures are introduced that efficiently utilize key FPGA components, such as LUTs and Carry4 primitives, by designing the optimal LUT-Carry4 netlists. A Pareto-based analysis is also performed to balance trade-offs and achieve a low mean error distance (MED). Simulation results confirm that the proposed architectures offer superior performance compared to existing signed approximate multipliers, delivering improved power efficiency, reduced resource usage, shorter critical path delay (CPD), and enhanced computational accuracy. The practical applicability of these approximate multipliers is further validated through their use in image processing applications.
- Research Article
- 10.1080/03772063.2026.2624599
- Feb 18, 2026
- IETE Journal of Research
- V Arun Antony + 1 more
The bioelectronics were used for the brain-machine interface along with neural engineering. These next-generation devices can be used for neuroprosthesis, which should be energy-efficient and do neuromorphic computing with bidirectional information transmission. The proposed study is suitable for designing a computing architecture for auditory, visual and multisensory systems. The bioinspired interactive computing device fuses multiple/mixed sensing signals with an exact and liquid state machine, which is effective in reducing the power and enhancing the computing power. In very large-scale integration, digital data processing methods can be designed using brain-inspired architectures. Here, a new approximate compressors design based on Multigate Field Effect Transistor (MuGFET) transistor named Fin Field Effect Transistor-FinFET for neuromorphic computing is presented. The leakage current problems and second-order effects in Complementary Metal Oxide Semiconductor (CMOS) are eliminated in the new approximate 4–2 compressor design. The FinFET and approximate multiplier for brain-inspired neuromorphic computing and liquid state machine architecture are effective, as observed from the results. In addition, in this paper, the design approach of a synapse circuit for a bioelectronic ubiquitous neuroprosthetic device is presented. The proposed circuit design combines nanoelectronics, electroceuticals, and neuroprostheses. The design consists of a synapse circuit which can provide electrical neuromodulation stimulation and be part of a neuromorphic architecture for bidirectional interactions. The delay is optimised along with power consumption and reported from the Extensive simulation. For implementation, FinFET and CMOS predictive technology models were used.
- Research Article
- 10.1088/1402-4896/ae31af
- Jan 9, 2026
- Physica Scripta
- Talla Srinivasa Rao + 2 more
Abstract In this paper, Modified Wallace Tree-Based Approximate Multipliers (MWTBAMs) were developed with a novel 4:2 compressor and Error Compensation (EC) technique to obtain improved power–delay–accuracy characteristics to enhance convolutional neural networks (CNNs) and image processing applications. 16×16 and 32×32 AMs have been introduced in this design using a recursive method to combine various types of lower-order multipliers, including existing approximate and exact multipliers. Recursive AMs enhance accuracy, but at the expense of increased complexity. More specifically, the proposed multipliers were implemented in Verilog, Cadence RTL Compiler, MATLAB and Xilinx Vivado, and simulation outcomes show a notable improvement in area, power consumption and delay. Synthesis results in 90-nm CMOS also confirm significant hardware cost reductions by implementing 8-, 16-, and 32-bit multipliers — up to 87% lower power, 24% lower delay, and 38% improvement in combined power–delay product (PDP) compared to existing approximate multipliers. The accuracy analysis shows that the proposed multipliers yield significant error reductions, achieving up to 99.7% lower MED and 99.6% lower MRED as compared to state-of-the-art Wallace-based and recursive approximate multipliers. In addition to the arithmetic-level studies, the multipliers proposed here display strong application-level reliability. The 8-bit MWTBAM delivers higher PSNR in image smoothing and edge detection, and in CNN inference it attains 97.95% accuracy on MNIST with only 2.05% loss, outperforming approximate designs.
- Research Article
1
- 10.1038/s41598-026-35104-5
- Jan 7, 2026
- Scientific reports
- Pegah Foroutan + 1 more
The main objective of this paper is to design low-power, high-speed approximation multipliers suitable for fault-tolerant applications in image and signal processing, aiming to minimize hardware cost, power consumption, and latency while maintaining acceptable accuracy for human-perceivable outputs. The innovation of the proposed technique lies in the innovative integration of flow-state logic with 4:2 dual-purpose compressors based on 7 nm CNTFET technology, which introduces six new compressor types that use adjustable threshold voltages, direct current summing without threshold detectors, and improved noise margins to significantly reduce sensitivity to process voltage–temperature (PVT) variations. This technology is up to 30–50% less vulnerable compared to previous CMOS and FinFET-based designs. These compressors are used to implement two types of 8 × 8 Dada multipliers: one with uniform approximation compressors and the other with truncated least significant bits and combined exact-approximation columns. Simulation results in HSPICE and MATLAB show that the optimized design achieves a power consumption of 0.52 mW, a latency of 1.88 ns, and a PDP of 0.97 pj, while maintaining comparable error rates and improving image quality metrics, such as MSSIM by 62% (from 59.61 to 96.83%) and PSNR by 15–20% compared to existing multipliers in the image multiplication function.
- Research Article
- 10.1109/access.2026.3678440
- Jan 1, 2026
- IEEE Access
- Ebrahim Farahmand + 6 more
In this paper, we propose a scalable approximate multiplier design, scaleTRIM, that approximates the multiplication operation using fitted linear functions, also referred to as linearization. We show that multiplication operations can be completely replaced by low-cost addition and bit-wise shift operations by exploiting linearization. Moreover, our proposed design utilizes a lookup table (LUT)-based compensation unit as a novel error-reduction method. In essence, input operands are truncated to a reduced bit-width representation (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">h</i> bits) based on their leading-one positions. Then, a curve-fitting method is employed to map the product term to a linear function. Additionally, a piecewise constant error-correction term is used to reduce the approximation error. To compute the piecewise constant, we divide the function space into <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">M</i> segments and average the errors within each segment. In particular, our multiplier supports various degrees of truncation and error compensation to offer a range of accuracy-efficiency trade-offs. The proposed multiplier improves the Mean Relative Error Distance (MRED) by about 15.2% while satisfying the efficiency constraint and improves the Power Delay Product (PDP) by about 22.8% while satisfying the accuracy and efficiency constraints compared to different state-of-the-art approximate multipliers. From a usability perspective, our evaluation of the proposed design for image classification using Deep Neural Networks (DNNs) demonstrates that scaleTRIM offers a better accuracy-efficiency trade-off than state-of-the-art approximate multiplier designs.
- Research Article
- 10.1109/les.2026.3676574
- Jan 1, 2026
- IEEE Embedded Systems Letters
- Vasundhara Trivedi + 1 more
Approximate computing has introduced a paradigm shift in hardware-optimized implementation of Edge-AI applications by balancing accuracy and area constraints simultaneously. In this work, a hardware-efficient, segmentation-based 16-bit approximate multiplier, ACSAM is presented for resource-constrained applications with error tolerance capabilities. The proposed 16-bit multiplier integrates various combinations of conventional and proposed 8-bit approximate multipliers with unique shifting and rounding strategies. The proposed multiplier ACSAM achieved upto 18.9% improvement in LUT utilisation on FPGA and upto 2.85× reduction in power and upto 12.27% reduction in area for ASIC implementation on 65nm technology node compared to the state-of-the-art works. ACSAM is validated using an image-blurring application to demonstrate its suitability for DSP and image-processing tasks. Additionally, FPGA and ASIC evaluations confirm its adaptability across diverse implementation requirements.
- Research Article
- 10.1109/tcad.2026.3656474
- Jan 1, 2026
- IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
- Lu Zhang + 7 more
Approximate computing has garnered significant attention due to its potential to reduce power consumption, enhance performance, and simplify circuit design. However, the security implications of applying approximate computing techniques remain largely unexplored. Identifying all possible vulnerabilities in approximate designs is very challenging. One of the challenges stems from the lack of insightful methodologies and metrics to perform a precise security evaluation. This paper presents <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ApproPower</i>, a security-driven framework for pre-silicon evaluation of power side-channel leakage in approximate multipliers. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">ApproPower</i> enables an analysis of how approximation techniques influence power side-channel leakage and employs symbolic path analysis to estimate delay-dependent power leakage behaviors at design time. Using a set of open-source approximate multipliers, we examine the relationship between data precision, approximate strategies, and measured power leakage. Our results show that symbolic path analysis provides useful guidance for identifying potential power side-channel risks. We also observe that some approximate designs can offer improved resource efficiency while exhibiting reduced leakage; for instance, in the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mul7x7u</i> set, a majority of benchmark circuits demonstrate lower leakage after approximation, with <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mul7x7u_03M</i> achieving a 35% reduction in resource usage compared to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mul8x7u_3C6</i>.
- Research Article
- 10.1109/access.2026.3684720
- Jan 1, 2026
- IEEE Access
- Y Rasheed + 6 more
AMPEREISH: Approximate Multipliers for Power Efficiency in FPGA Designs Using Internal-Self-Healing
- Research Article
- 10.1109/access.2026.3653253
- Jan 1, 2026
- IEEE Access
- Vasundhara Trivedi + 4 more
The growing demand for efficient deep learning inference on edge devices requires hardware that is both precision-adaptive and resource-efficient. This paper introduces C-SIMD, a CORDIC-driven, configurable SIMD Processing Element (PE) architecture for scalable, multi-precision MAC operations in DNN accelerators. C-SIMD supports dynamic operand precision (4/8/16/32-bit) and enables symmetric and asymmetric computation modes, covering integer and fixed-point arithmetic. By leveraging partial product computation with pipelined 8-bit CORDIC-based approximate multipliers, the architecture scales efficiently to higher precision while achieving notable area and power savings. A configurable pipeline offers tunable trade-offs between accuracy and complexity, making C-SIMD suitable for resource-constrained inference. Strategic reuse of the adder in the accumulation path enhances throughput and optimizes resource utilization. Unlike prior designs, C-SIMD fully exploits available resources and supports configurations such as 16 parallel 8×8-bit, 4 parallel 16×16-bit, single 32×32-bit, and asymmetric 32×8-bit MACs. Hardware evaluation demonstrates up to 14.29% area savings and as much as 16.17× throughput improvement. The proposed C-SIMD_Low (4/8/16) achieves 7.04 GOP/s, while C-SIMD_High (8/16/32) attains 4.16 GOP/s, delivering a 4× performance-efficiency gain over prior MAC architectures. Inference tests indicate minimal accuracy loss—below 1% on MNIST-LeNet, under 2.9% on CIFAR-10-AlexNet, and less than 2.2% on CIFAR-10-VGG16 compared to float32 baselines—demonstrating its potential for high-throughput, energy-efficient Edge-AI systems.
- Research Article
- 10.1109/les.2026.3676640
- Jan 1, 2026
- IEEE Embedded Systems Letters
- M C Parameshwara + 2 more
This paper presents a new area-efficient approximate 4-2 compressor with improved image quality and error metrics. The proposed compressor is developed using functional approximation and Karnaugh map (K-map) simplification. The circuit behavior is verified through modeling and synthesis using Verilog HDL. The effectiveness of the proposed compressor for image multiplication is evaluated using an 8 × 8 approximate multiplier simulated in MATLAB. The performance of the proposed design is compared with existing compressors using circuit, error, and image-quality metrics. Post-synthesis and image-multiplication results show that the proposed compressor achieves excellent power, delay, and image-quality performance compared to existing area-efficient designs.