Articles published on Partial product reduction
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
97 Search results
Sort by Recency
- Research Article
- 10.55041/isjem05693
- Mar 16, 2026
- International Scientific Journal of Engineering and Management
- Panchakarla Viswaja + 3 more
Edge computing platforms require energy-efficient arithmetic units to handle real-time, data-intensive workloads under strict power and area constraints. Conventional multipliers consume considerable power and hardware resources, making them less suitable for resourcelimited edge devices. This work proposes a low-power approximate multiplier architecture that employs an optimized 5:2 approximate compressor to improve partial product reduction efficiency. By reducing the number of reduction stages and switching activity, the design achieves lower dynamic power consumption and improved power delay product (PDP) while maintaining acceptable accuracy for error-tolerant applications. System-level validation using MATLAB-based image processing demonstrates that the propoed multiplier is well suited for signal processing, image processing, and deep neural network workloads inresource-constrained edge environments.
- Research Article
- 10.4108/eetiot.10362
- Mar 3, 2026
- EAI Endorsed Transactions on Internet of Things
- K.B Bhavya + 1 more
This paper presents a hybrid low-power Wallace Tree multiplier architecture employing Gate Diffusion Input (GDI) and Pass Transistor Logic (PTL) techniques, specifically designed for efficient integration into Discrete Cosine Transform (DCT)-based image compression systems. The proposed design incorporates a novel hybrid full adder that leverages the low-power advantages of GDI and the high-speed characteristics of PTL, resulting in a compact, power-optimized solution. GDI logic is utilized for less-transistor-count realization of sum and carry logic, while PTL enhances XOR computation and signal propagation with minimal delay and area overhead. This hybrid adder is embedded into a Wallace Tree multiplier, which serves as a critical computational block within the 2D-DCT transform engine, commonly used in JPEG image compression. The multiplier's efficient structure significantly reduces the number of logic stages needed for partial product reduction, ensuring high throughput and reduced switching activity. Implemented in 90 nm and 45 nm CMOS technologies, the design achieves notable improvements in power-delay product (PDP), area, and energy efficiency when compared to conventional CMOS or single-style logic designs. Simulation of the design is performed using Cadence EDA spectre simulator, results up to a 10× reduction in power consumption and substantial area savings. These results establish the hybrid GDI-PTL-based Wallace Tree multiplier as a highly suitable solution for real-time and portable image processing applications, including mobile devices, low-power high efficiency video encoders, and energy-constrained embedded systems.
- Research Article
1
- 10.1038/s41598-025-25239-2
- Nov 21, 2025
- Scientific Reports
- Aqib Amin Rather + 3 more
Approximate computing comes to the fore as an alternative paradigm to enhance efficiency in computing systems by trading off the system’s accuracy for better performance. This paper seeks to leverage the principles of approximate computing to design efficient multiplier architectures for FPGA platforms. Specifically, this work presents FPGA implementations of one accurate and two approximate multiplier units based on the Dadda algorithm. The multipliers employ a novel partial product reduction technique that minimizes the utilized resources and the critical path delay, offering a more resource-efficient alternative than traditional multipliers. Our accurate and best-performing approximate 8 × 8 multiplier shows an improvement of 28% and 37% in PDAP over the Xilinx exact multiplier and the most performance-efficient existing approximate multiplier, respectively. Further evaluation based on the processing of images with different modalities shows a substantial improvement in PSNR over the existing approximate multipliers, especially in the healthcare domain, thereby highlighting the possible application of the proposed multipliers in error-resilient medical imaging tasks.
- Research Article
- 10.1088/2631-8695/adfdaf
- Sep 1, 2025
- Engineering Research Express
- Burhan Khurshid
Abstract The indispensability of the multiplication operation in digital signal processing applications is well established. Most of the contemporary multiplier designs are mainly suited for ASICs. Implementing ASIC-based designs on FPGAs does not yield significant performance gains due to the fundamental architectural difference between the two platforms. Few FPGA-based multiplier designs have been proposed recently that focus on exploiting the architectural features of FPGAs, like LUTs and Carry4 primitives. However, these designs are far from optimal because the full computation potential of the underlying FPGA resources is not exploited. While many FPGA vendors also include high-performance hardwired and softcore multipliers, they are typically limited in number and suffer from high interconnect delays due to their fixed position in the FPGA fabric. To counter these issues, we present a softcore multiplier design that optimally exploits the underlying FPGA resources. Our implementation is based on the methodology that restructures the multiplier Boolean network so that the logic nodes are optimally distributed to LUTs and Carry4 primitives. While existing designs use Carry4 primitives only in the partial product reduction stage, our methodology enables the use of Carry4 primitives in both the partial product generation and partial product reduction stages. This results in reduced LUT count and a faster structure. Our 8-bit multiplier utilizes only 35 LUTs and has a PDAP of 2740 as against 51 LUTs and a PDAP of 4454 for the area-optimized Xilinx IP multiplier and 60 LUTs and a PDP of 4660 for the speed-optimized Xilinx IP multiplier. This accounts for 31% and 41% improvement in LUT count and 38% and 41% improvement in PDAP compared to the area and speed optimized Xilinx proprietary multipliers. Similarly, compared to the best 8-bit softcore multiplier in the literature, our design shows an improvement of 33% in PDAP. These performance trends are not one-off but persist as the word length of the multipliers increases beyond eight bits.
- Research Article
- 10.55041/ijsrem43720
- Apr 4, 2025
- INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Y Ramtrivinayak, + 3 more
Floating-Point Multiply-Accumulate (FPMAC) units are fundamental in high-performance computing applications such as digital signal processing and machine learning. This study presents an optimized FPGA-based FPMAC, integrating Booth encoding for multiplication, Wallace tree-based partial product reduction, and a 3:1 compressor for efficient accumulation. Comparative analysis highlights significant performance improvements over conventional designs, including a 30% reduction in delay, an operational frequency of 488 MHz, and a 22% increase in throughput. Additionally, power consumption is reduced by 55%, while resource utilization is optimized. The proposed architecture, with five pipeline stages, enhances computational efficiency, making it highly suitable for real-time embedded applications requiring high-speed floating-point operations. Keywords: FPGA, FPMAC, Verilog, Booth Encoding, Wallace Tree, BDSA, Compressor, Linear Zero Prediction, Delay, Power, Precision
- Research Article
2
- 10.1080/1448837x.2025.2454812
- Mar 6, 2025
- Australian Journal of Electrical and Electronics Engineering
- Yogeswari Palanisamy + 5 more
ABSTRACT Multiply-Accumulate (MAC) units play a key role in Digital Signal Processing (DSP) systems by performing multiplication and addition in a single cycle. This work introduces a power-efficient MAC unit designed using an approximate computing approach. The proposed MAC leverages a Dadda multiplier (DM), known for its superior performance over compressor-based architectures, utilizing twin 4:2 compressors to optimize partial product reduction. To further enhance efficiency, approximate adders with minimal errors and simplified design are used, reducing power consumption and saving area. When compared to a traditional MAC unit built with a Brent-Kung adder and a Vedic multiplier, the proposed design demonstrates impressive improvements: a 26% reduction in area, 11% power savings, and 28% lower delay. The MAC’s effectiveness is validated in image processing tasks, such as edge detection, where it delivers results comparable to those of exact MAC units. Synthesized using the Xilinx Virtex 4 FPGA platform and implemented in Verilog HDL, the proposed design outperforms existing techniques in hardware efficiency, power optimization, and latency reduction. This makes it an excellent choice for high-performance DSP applications, combining speed, efficiency, and accuracy in a compact design.
- Research Article
3
- 10.3390/electronics14020333
- Jan 16, 2025
- Electronics
- Ioannis Rizos + 2 more
In the nano-scale era, enhancing speed while minimizing power consumption and area is a key objective in integrated circuits. This demand has motivated the development of approximate computing, particularly useful in error-tolerant applications such as multimedia, machine learning, signal processing, and scientific computing. In this research, we present a novel method to create approximate integer multiplier circuits. This work is based on a modification of the well-known Wallace tree multiplier, called the Reduced Complexity Wallace Multiplier (RCWM). Approximation is introduced by replacing conventional Full Adders with approximate ones during the partial product reduction phase. This research investigates the characteristics of 8×8-, 16×16-, and 32×32-bit Approximate Reduced Complexity Wallace Multipliers (ARCWM), evaluating their accuracy, area usage, delay, and power consumption. Given the vast search space created by different combinations and placements of these approximate Adders, a Genetic Algorithm was used to efficiently explore this space and optimize the ARCWMs. The resulting ARCWMs have an area reduction of up to 65% and a power consumption reduction of up to 70%, with no worse delay than the RCWM. Multipliers created with this method can be used in any application that requires parallel multiplication, such as neural accelerators, trading accuracy for area and power reduction. Additionally, an ARCWM can be used alongside a slow shift-and-accumulate multiplier trading off accuracy for faster calculation. This methodology provides valuable guidance for designers in selecting the optimal configuration of approximate Full Adders, tailored to the specific requirements of their applications. Alongside the methodology, we provide all of the tools used to achieve our results as open-source code, including the Register-Transfer Level (RTL) code of the 8×8-, 16×16-, and 32×32-bit Wallace Multipliers.
- Research Article
- 10.55041/ijsrem40282
- Dec 30, 2024
- INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Banoth Ashwini + 3 more
To address the need to reduce power consumption, approximate multipliers have emerged as a potential solution for fault-tolerant applications. In this work, we present a new 8x8 approximate multiplier that focuses on minimizing performance while maintaining a high degree of accuracy. The design features two key features: firstly, based on their importance, different weights are handled by the compressors with different levels of precision, allowing for a trade-off between energy efficiency and minimum error. Second, higher order approximation compressors such as 8:2 compressors are used for intermediate weights to simplify the drive chain logic. This is, to our knowledge, the first design to successfully integrate higher-order approximate compressors into an approximate multiplier. Compared to a precision multiplier such as the Dadda tree multiplier, experimental results show that the proposed design offers significant energy savings while maintaining high accuracy. Key Words:- Approximate computing; Arithmetic circuits; Logic design; Low-power design; Partial Product reduction
- Research Article
1
- 10.29292/jics.v19i3.927
- Dec 23, 2024
- Journal of Integrated Circuits and Systems
- Vinicius Zanandrea + 2 more
Approximate Computing can be effectively applied in error-tolerant applications in order to improve the energy efficiency of the circuits. Arithmetic units such as adders and multipliers are the core components in many embedded devices and hardware accelerators. In particular, multipliers have a significant influence on the performance and power characteristics of the system where they are inserted. As a result, approximate multiplier design has become an important research subject in recent years. This work discusses the state-of-the-art on the use of approximate computing to design energy-efficient multipliers. We observe that most of the proposed approximate multipliers rely on approximations in the partial product reduction, mainly exploiting approximate compressors or approximate adders. In addition, there are only few works targeting neural networks, which demonstrates that there is room for explore an approximate multiplier design in this application.The set of information provided in this paper allows to identify gaps in the literature and possibilities for new research on the design of low-power multipliers.
- Research Article
1
- 10.1142/s0218126625500665
- Oct 21, 2024
- Journal of Circuits, Systems and Computers
- Garima Thakur + 1 more
The energy-efficient error-tolerant circuits have paved the way for a whole new area in low-power consumption applications with approximate computing. The approximate computing fulfills the trade-off requirement of exact computation and provides efficient performance. In this paper, a novel energy-efficient multiplier has been proposed for image processing applications. In the multiplication process, compressors are used as an important component for the reduction of partial products. Higher-order approximate 5:2 and 6:2 compressors are also designed and simulated in VIVADO using Verilog coding. The proposed higher-order compressors result in less area and low-power consumption in comparison with the existing state-of-the-art technique. These high-performance compressors are used at the multipliers’ reduction stage, resulting in an energy-efficient circuit for error-tolerant applications. All the simulations were carried out in VIVADO considering 8-bit inputs. Multiplication performance shows 37.77 % (8-bit) improvement in terms of power consumption in comparison to the conventional multiplier. The multiplication process has been done on the original, negative, and sharpened images using their masks. The proposed multiplier shows 51.36% (original image), 6.04% (negative image), and 22.44% (sharpened image) PSNR improvement in comparison to state-of-the-art work.
- Research Article
5
- 10.1016/j.prime.2024.100698
- Jul 19, 2024
- e-Prime - Advances in Electrical Engineering, Electronics and Energy
- Ahsan Rafiq + 1 more
Multipliers are essential computation units in virtually all computing systems, including processors and numerous AI accelerator architectures. This paper presents an optimized architecture for a Booth multiplier, targeting high performance while minimizing energy consumption and area utilization. The design optimization focuses on all three multiplier stages: partial product generation, reduction, and summation. To enhance delay and energy efficiency in the partial product generation stage, we first employed a simplified configuration comprising inverters and a sign selection unit instead of complex binary-to-two's complement circuitry. Next, to achieve further delay and area efficiency at this stage, logic optimization is applied at the partial product's generation circuitry by designing Booth encoders to remove redundant logic in multiplexers circuitry. Moreover, we introduced specialized sign compressors tailored for carry-save compression in the compression stage. Compared to conventional counterparts, these compressors offered lower power consumption and reduced critical path delay with only two XOR logic gates. Finally, in the summation stage, we proposed an optimized design segment for Carry Look-Ahead Adder for the final summation stage, designed to deliver swift throughput with minimal fan-in logic gates, even in the context of high bit-width configurations. This segment is cascaded to make a 13-bit final adder for the summation stage in the proposed design. The proposed architecture undergoes ASIC-targeted synthesis in Cadence Genus employing FreePDK CMOS 45 nm process technology. Synthesized results, along with theoretical design complexity comparison, demonstrate that the proposed design surpasses state-of-the-art 8 × 8 multiplier designs by critical metrics, including delay, power consumption, area utilization, power delay product, and area delay product.
- Research Article
- 10.22214/ijraset.2024.63379
- Jun 30, 2024
- International Journal for Research in Applied Science and Engineering Technology
- Mr M Murali
Abstract: In the digital age, efficient computation is critical for high-performance digital signal processing (DSP) applications. This project focuses on designing and implementing a 32-bit Multiply and Accumulate (MAC) unit using the Dadda multiplier and Carry Save Adder (CSA). Traditional MAC units often struggle with speed, area efficiency, and power consumption. The Dadda multiplier, known for its efficient partial product reduction, is coupled with the CSA to minimize carry propagation delay, thus enhancing overall performance. The proposed MAC unit demonstrates superior speed, power efficiency, and reduced hardware complexity compared to conventional units. It is particularly suitable for applications like FIR and IIR filters, FFT, and neural networks, where high-speed arithmetic operations are essential. The design is synthesized and simulated using industry-standard tools, showcasing its effectiveness and suitability for various DSP applications. By reducing critical path delays and optimizing power consumption, the new MAC unit promises significant improvements in both computational speed and energy efficiency, meeting the growing demands of modern DSP systems.
- Research Article
10
- 10.1016/j.vlsi.2024.102215
- May 28, 2024
- Integration
- Dinesh Kumar Jayaraman Rajanediran + 3 more
Hybrid Radix-16 booth encoding and rounding-based approximate Karatsuba multiplier for fast Fourier transform computation in biomedical signal processing application
- Research Article
2
- 10.37391/ijeer.120215
- Apr 30, 2024
- International Journal of Electrical and Electronics Research
- Perumal B + 4 more
This work presents a novel approach to improve the area and energy efficiency of 5:3 counter, a key element used in digital arithmetic. To provide an effective substitute for addition operations, mostly in the partial product reduction stage of larger multipliers, this study suggests a new 5:3 counter. The Input Shuffling Unit (ISU) is employed within the proposed 5:3 counter to minimize gate-level implementation and path delay during partial product reduction in 16-bit and larger multipliers, thereby enhancing area and energy efficiency. Consequently, there are 84% fewer choices of input-output combinations, thereby decreasing the circuit complexity with respect to area and energy usage. When compared to its existing counterparts, the suggested 5:3 compressor improves area utilization and energy usage by an average of 11%, 17%, and 17% in 8-, 16-, and 32-bit multipliers, respectively. The results of simulations demonstrate the superiority of our method over traditional designs, providing an increase in both area and energy efficiency. These results highlight the applicability and scalability of our method, which is appropriate for a variety of applications such as embedded systems and digital signal processing.
- Research Article
2
- 10.1016/j.microrel.2023.115277
- Nov 17, 2023
- Microelectronics Reliability
- Sukanya Balasubramani + 2 more
Performance optimized approximate multiplier architecture ST-AxM - based on statistical analysis and static compensation
- Research Article
- 10.22214/ijraset.2023.56082
- Oct 31, 2023
- International Journal for Research in Applied Science and Engineering Technology
- Manchiryala Manogna + 1 more
Abstract: In this paper, we propose an approximate multiplier that is Approximate computing (AC) offers benefits by reducing the requirement for accuracy, thereby reducing delay. The majority logic (ML) gate functions as the fundamental logic block of many emerging nanotechnologies. These adders are designed to prevent the propagation of inexact carry-out signals to higher order computing parts to enhance accuracy. We implemented the proposed multiplier by using a unique partial product reduction (PPR) circuitry, which was based on the parallel approximate 6:3 compressor. The implemented by quantum-dot cellular automata (QCA) are analyzed to evaluate the adder designs. A significant improvement is observed over previous designs based on the experimental results. The proposed design is further designed using kogge stone adder. Finally, It has added advantage that reduces logic size and facilitates with less power and delay. Here we are using Verilog HDL and Xilinx ISE14.8 software tools for simulation and synthesis purpose
- Research Article
- 10.29292/jics.v18i2.754
- Sep 27, 2023
- Journal of Integrated Circuits and Systems
- Vinicius Zanandrea + 3 more
With the rising importance of power consumption in battery-powered devices, approximate computing techniques have emerged as a promising approach to strike a balance between exact computation and power savings, leading to improved delays. This paper investigates the combination of near-threshold operation and approximate adders to design power-efficient multipliers. We analyzed four multiplier architectures using 16 nm low-power and high-performance models. At the transistor level, three strategies for approximate full adders are explored, focusing on both partial product reduction and the final addition stage of the multipliers. Eleven test cases are thoroughly evaluated to identify the most suitable approximate circuit, considering the trade-offs among power, performance, and accuracy. The obtained results demonstrate a substantial reduction in power consumption at near-threshold operation. The replacement of exact full adders with the approximate copy strategy in the least significant bits of the multipliers leads to a reduction of up to 34.4% in power consumption and 19.2% in delay. The design-space exploration carried out in this study provides valuable insights for designers to choose the best approximate multiplier based on specific design requirements.
- Research Article
11
- 10.1016/j.micpro.2023.104909
- Aug 2, 2023
- Microprocessors and Microsystems
- Pramod Alamuri + 3 more
Improved approximate multiplier architecture for image processing and neural network applications
- Research Article
4
- 10.1049/tje2.12296
- Aug 1, 2023
- The Journal of Engineering
- Naga Venkata Vijaya Krishna Boppana + 1 more
Abstract The continued quest for finding a low‐power and high‐performance hardware algorithm for signed number multiplication led to designing a simple and novel radix‐8 signed number multiplier with 3‐bit grouping and partial product reduction performed using magnitudes of the multiplicand and the multiplier. The pre‐computation stage constitutes magnitude calculation and non‐trivial computations required to generate partial products. A new partial product reduction strategy is deployed in the design to improve the speed with low cost. 8×8, 16×16, 32×32, and 64×64 designs are presented for the proposed architectures. Performance results include area, power, delay, and power‐delay‐product of synthesized and post‐layout designs using 32 nm CMOS technology with 1.05 V supply voltage.
- Research Article
9
- 10.1016/j.vlsi.2023.102055
- Jun 30, 2023
- Integration
- Jeyakumar Ponraj + 3 more
High-performance multiply-accumulate unit by integrating binary carry select adder and counter-based modular wallace tree multiplier for embedding system