Energy Efficient Error Resilient Multiplier Using Low-power Compressors
The approximate hardware design can save huge energy at the cost of errors incurred in the design. This article proposes the approximate algorithm for low-power compressors, utilized to build approximate multiplier with low energy and acceptable error profiles. This article presents two design approaches (DA1 and DA2) for higher bit size approximate multipliers. The proposed multiplier of DA1 have no propagation of carry signal from LSB to MSB, resulted in a very high-speed design. The increment in delay, power, and energy are not exponential with increment of multiplier size ( n ) for DA1 multiplier. It can be observed that the maximum combinations lie in the threshold Error Distance of 5% of the maximum value possible for any particular multiplier of size n . The proposed 4-bit DA1 multiplier consumes only 1.3 fJ of energy, which is 87.9%, 78%, 94%, 67.5%, and 58.9% less when compared to M1, M2, LxA, MxA, accurate designs respectively. The DA2 approach is recursive method, i.e., n -bit multiplier built with n/2-bit sub-multipliers. The proposed 8-bit multiplication has 92% energy savings with Mean Relative Error Distance (MRED) of 0.3 for the DA1 approach and at least 11% to 40% of energy savings with MRED of 0.08 for the DA2 approach. The proposed multipliers are employed in the image processing algorithm of DCT, and the quality is evaluated. The standard PSNR metric is 55 dB for less approximation and 35 dB for maximum approximation.
- Research Article
9
- 10.1016/j.vlsi.2023.04.006
- Apr 24, 2023
- Integration
Energy efficient multiply-accumulate unit using novel recursive multiplication for error-tolerant applications
- Research Article
11
- 10.1016/j.memori.2022.100017
- Oct 12, 2022
- Memories - Materials, Devices, Circuits and Systems
High-performance, energy-efficient, and memory-efficient FIR filter architecture utilizing 8x8 approximate multipliers for wireless sensor network in the Internet of Things
- Research Article
39
- 10.1016/j.vlsi.2023.102084
- Sep 8, 2023
- Integration
Efficient and low-cost approximate multipliers for image processing applications
- Conference Article
15
- 10.1109/icfpt56656.2022.9974399
- Dec 5, 2022
With the increasing demand for data processing, approximate computing is widely used in various fault-tolerant applications such as image processing, computer vision and machine learning. These applications also require a huge number of multiplication operations. In this paper, we are mainly oriented to the softcore approximate multiplier which is implemented on FPGA via encoding the INIT parameter values in the Look-Up-Table (LUT) primitives. Three approximate multipliers with associated carry chain are presented in the manner of reducing LUTs from proposed exact multiplier. An approximate multiplier without carry chain is also presented to further reduce the multiplier's critical path delay and power consumption. We also present an accuracy configurable adder to build high-order approximate multipliers for architectural space exploration. The resolution of the state-of-the-art Mean Relative Error Distance (MRED) and Power-Delay Product (PDP) pareto front is improved and the approximate multiplier we proposed achieves 24.4%, 52.9% and 56.4% reduction in latency, area, and power over the soft multiplier IP core, respectively. Finally, we apply the proposed approximate multiplier design to image processing and convolutional neural networks (CNNs). Compared to advanced approximate multipliers, it offers less energy consumption and area while remaining acceptable qualities. Our designs are open sourced at https://github.com/Yaoshangshang96/FPGA-based_approx_mult to assist further reproducing and development.
- Conference Article
26
- 10.1109/asp-dac47756.2020.9045546
- Jan 1, 2020
Approximate multiplier design is an effective technique to improve hardware performance at the cost of accuracy loss. The current approximate multipliers are mostly ASIC-based and are dedicated for one particular application. In contrast, FPGA has been an attractive choice for many applications, because of its high performance, reconfigurability, and fast development. This paper presents a novel methodology for designing approximate multipliers by employing the FPGA-based fabrics. The area and latency are significantly reduced by cutting the carry propagation path in the multiplier. Moreover, we explore higher-order multipliers on architectural space by using our proposed small-size approximate multipliers as elementary modules. For different accuracy requirements, eight configurations for approximate 8 × 8 multiplier are discussed. In terms of mean relative error distance (MRED), the accuracy loss of the proposed 8 × 8 multiplier is low as 0.17%. Compared with the exact multiplier, our proposed design can reduce area by 43.66% and power by 20.36%. The critical path latency reduction is up to 27.66%. The proposed multiplier design has a better accuracy-hardware tradeoff than other designs with com-parable accuracy.
- Research Article
12
- 10.1016/j.mejo.2023.105783
- Apr 13, 2023
- Microelectronics Journal
Design and analysis of leading one/zero detector based approximate multipliers
- Research Article
220
- 10.1109/jetcas.2018.2832204
- Sep 1, 2018
- IEEE Journal on Emerging and Selected Topics in Circuits and Systems
Approximate computing has been considered to improve the accuracy-performance tradeoff in error-tolerant applications. For many of these applications, multiplication is a key arithmetic operation. Given that approximate compressors are a key element in the design of power-efficient approximate multipliers, we first propose an initial approximate 4:2 compressor that introduces a rather large error to the output. However, the number of faulty rows in the compressor's truth table is significantly reduced by encoding its inputs using generate and propagate signals. Based on this improved compressor, two 4 × 4 multipliers are designed with different accuracies and then are used as building blocks for scaling up to 16 × 16 and 32×32 multipliers. According to the mean relative error distance (MRED), the most accurate of the proposed 16 × 16 unsigned designs has a 44% smaller power-delay product (PDP) compared to other designs with comparable accuracy. The radix-4 signed Booth multiplier constructed using the proposed compressor achieves a 52% reduction in the PDP-MRED product compared to other approximate Booth multipliers with comparable accuracy. The proposed multipliers outperform other approximate designs in image sharpening and joint photographic experts group applications by achieving higher quality outputs with lower power consumptions. For the first time, we show the applicability and practicality of approximate multipliers in multiple-input multiple-output antenna communication systems with error control coding.
- Conference Article
3
- 10.1109/ibcast54850.2022.9990566
- Aug 16, 2022
Energy-efficient and high-performance general- purpose compute engines, as well as application specific integrated circuits, are highly demanded to facilitate the development of artificial intelligence and big data processing applications. However, with the end of Dennard’s scaling and Moore’s law it is becoming difficult to handle massive amounts of data and complex computations required in these applications. Approximate computing (AC) has emerged as an attractive paradigm in the digital design to address this unprecedented challenge. AC is driven by the observation that many state-of-the-art applications, such as classification, machine learning, data mining, robotics and communication, exhibit error-tolerant characteristics; therefore, a small amount of error (trades off the requirement of exact computation) can be introduced to achieve area, power, and speed benefits. AC techniques can be applied at both the software and hardware layers. At the hardware layer, arithmetic units (multipliers, adders, and dividers) are considered as hardware computational modules. Therefore, the approximation at hardware layer has been focused around the design of approximate arithmetic units. This paper presents approximate multipliers based on novel 4:2 compressors for error-tolerant applications. The proposed 4:2 compressors exhibit zero-mean error behavior while having a comparable hardware utilization with the existing state-of the-art designs. The hardware-efficient as well as the error-efficient designs of variable accuracy-power has been investigated to explore the maximum trade-off. All the designs are synthesized using Cadence Genus synthesis tool (TSMC 65 nm technology) and power is reported using Cadence Joules RTL power solution. A comprehensive error analysis is performed using well-known error metrics such as error distance (ED), mean average distance (MED), mean relative error distance (MRED) and normalized mean error distance (NMED). Moreover, all the designs are also compared with respect to power-delay product (PDP) and MRED to apprehend which designs are lying on the error-energy Pareto- optimal curve. A case study is also presented to demonstrate the applicability of the proposed designs in practical image processing application.
- Conference Article
39
- 10.1109/apccas.2018.8605570
- Oct 1, 2018
Approximate computing is applicable to improve hardware performance by sacrificing some accuracy for error-tolerant applications, where multiplication is a key arithmetic operation. In this paper, we propose a low-cost approximate multiplier design by employing new probability-driven inexact 4:2, 6:2, 8:2 compressors and inexact half-adders. This compressor design is explored to reduce the height of partial product matrix into two rows. Different levels of accuracy can be achieved through a grouped error recovery scheme that employs different numbers of error compensation vectors for error reduction. The mean relative error distance (MRED) of the proposed multiplier design is from 1.07% to 7.86%. Compared with the Wallace multiplier using SMIC 40nm process, the most accurate variant of the proposed design reduces power by 50.52%, area by 52.46%, and delay by 33.90%. The proposed multiplier design has a better accuracy-performance trade-off than other designs. Moreover, the efficiency of approximate multipliers is assessed in an image processing application.
- Research Article
126
- 10.1109/tc.2019.2926275
- Nov 1, 2019
- IEEE Transactions on Computers
Approximate computing is an emerging technique in which power-efficient circuits are designed with reduced complexity in exchange for some loss in accuracy. Such circuits are suitable for applications in which high accuracy is not a strict requirement. Radix-4 modified Booth encoding is a popular multiplication algorithm which reduces the size of the partial product array by half. In this paper, three Approximate Booth Multiplier Models (ABM-M1, ABM-M2, and ABM-M3) are proposed in which approximate computing is applied to the radix-4 modified Booth algorithm. Each of the three designs features a unique approximation technique that involves both reducing the logic complexity of the Booth partial product generator and modifying the method of partial product accumulation. The proposed approximate multipliers are demonstrated to have better performance than existing approximate Booth multipliers in terms of accuracy and power. Compared to the exact Booth multiplier, ABM-M1 achieves up to a 23 percent reduction in area and 15 percent reduction in power with a Mean Relative Error Distance (MRED) value of $7.9\times 10^{-4}$7.9×10-4. ABM-M2 has area and power savings of up to 51 and 46 percent respectively with a MRED of $2.7\times 10^{-2}$2.7×10-2. ABM-M3 has area savings of up to 56 percent and power savings of up to 46 percent with a MRED of $3.4\times 10^{-3}$3.4×10-3. The proposed designs are compared with the state-of-the-art existing multipliers and are found to outperform them in terms of area and power savings while maintaining high accuracy. The performance of the proposed designs are demonstrated using image transformation, matrix multiplication, and Finite Impulse Response (FIR) filtering applications.
- Research Article
4
- 10.1142/s0218126621501383
- Nov 20, 2020
- Journal of Circuits, Systems and Computers
Low power dissipation in approximate arithmetic circuits has laid the foundation for area-efficient computational units for error resilient applications like image and signal processing. This paper proposes two novel low power high speed architectures for approximate 4:2 compressor that can be employed in multipliers for partial product summation. The two designs presented ([Formula: see text] and [Formula: see text]) have Error Distance (ED) of [Formula: see text] and Error Rate (ER) of 25%. The proposed [Formula: see text] and [Formula: see text] are able to achieve reduction in power and delay by (62.50%, 47.67%) and (83.13%, 60.20%), respectively, in comparison with the exact 4:2 compressor. To verify the effectiveness of the design, the proposed architectures are used to implement [Formula: see text] Dadda multiplier. The equal number of errors in positive and negative directions in the proposed designs aid in reducing the Mean Error Distance (MED) and Mean Relative Error Distance (MRED) of the multiplier. Multiplication of images and two-level decomposition of 2D Haar wavelets are implemented using the designed Dadda multiplier. The efficiency of the image processing applications is measured in terms of Mean Structural Similarity (MSSIM) index and Peak Signal-to-Noise Ratio (PSNR) and an average of 0.98 and 35[Formula: see text]dB, respectively, is obtained, which are in the acceptable range. In addition, a Convolutional Neural Network (CNN)-based LeNet-1 Handwritten Digit Recognition System (HDRS) is implemented using the proposed compressor-based multipliers. The proposed compressor-based architectures are able to achieve an average accuracy of 96.23%.
- Research Article
51
- 10.3390/electronics9030471
- Mar 11, 2020
- Electronics
This paper presents an energy-efficient approximate adder with a novel hybrid error reduction scheme to significantly improve the computation accuracy at the cost of extremely low additional power and area overheads. The proposed hybrid error reduction scheme utilizes only two input bits and adjusts the approximate outputs to reduce the error distance, which leads to an overall improvement in accuracy. The proposed design, when implemented in 65-nm CMOS technology, has 3, 2, and 2 times greater energy, power, and area efficiencies, respectively, than conventional accurate adders. In terms of the accuracy, the proposed hybrid error reduction scheme allows that the error rate of the proposed adder decreases to 50% whereas those of the lower-part OR adder and optimized lower-part OR constant adder reach 68% and 85%, respectively. Furthermore, the proposed adder has up to 2.24, 2.24, and 1.16 times better performance with respect to the mean error distance, normalized mean error distance (NMED), and mean relative error distance, respectively, than the other approximate adder considered in this paper. Importantly, because of an excellent design tradeoff among delay, power, energy, and accuracy, the proposed adder is found to be the most competitive approximate adder when jointly analyzed in terms of the hardware cost and computation accuracy. Specifically, our proposed adder achieves 51%, 49%, and 47% reductions of the power-, energy-, and error-delay-product-NMED products, respectively, compared to the other considered approximate adders.
- Book Chapter
4
- 10.1007/978-981-19-8742-7_42
- Jan 1, 2023
Approximate Computing has emerged as a propitious solution for faster, energy-efficient and less complex designs for circuits. Approximate arithmetic circuits are a type of circuit that achieves power and area efficiency by intentionally introducing imperfections into circuit’s output behavior. In arithmetic circuits adder plays a prominent role. It has become essential to understand the approximation techniques and methods to enhance performance and efficiency. This paper aims to provide a comprehensive review on approximate adders and comparatively assessed in terms of error and performance based on speed, area and power. Arithmetic circuits are implemented and synthesized using HDLs and design compiler and error characterization is done by using MATLAB. In this paper power, speed and area are compared with respect to error distance, normalized mean error distance and mean relative error distance. The comparative result conveys that equal segmentation adder has low accuracy but it is a hardware efficient design. After evaluation analysis conveys that equally accurate adders are error-tolerant adder type II, Speculative carry select adder and the accuracy configurable approximate adder. In this most power consuming adder is almost the correct adder. Among all adders, the slowest and extremely efficient adder is the lower part OR adder.KeywordsApproximate computingArithmetic circuitsAdderError characteristicsEvaluation
- Research Article
49
- 10.1109/access.2021.3108443
- Jan 1, 2021
- IEEE Access
This paper proposes a novel approximate adder that exploits an error-reduced carry prediction and constant truncation with error reduction schemes. The proposed adder design techniques significantly improve overall computation accuracy while providing excellent hardware efficiency. Particularly, the proposed carry prediction technique can reduce a prediction error rate by up to 75% compared to existing approximate adders considered in this paper. Furthermore, the error reduction technique also enhances the overall computation accuracy by decreasing the error distance (ED). Our experimental results show that the proposed adder improves the normalized mean ED (NMED) and mean relative ED (MRED) by up to 91.4% and 98.9%, respectively, compared to the other approximate adders. Importantly, an excellent design tradeoff allows the proposed adder to be the most competitive of the adders under consideration. Specifically, the proposed adder achieves up to 95.7%, 91.1%, and 93.2% reductions of the power-NMED, energy-NMED, and area-delay product (ADP)-NMED products, respectively, compared to the other adders. Our adder enhances the power-, energy-, and ADP-MRED products by up to 99.4% compared to the others. In particular, the figure of merit (FoM) considering both hardware and accuracy of the proposed adder is up to 93.05% smaller than that of the other approximate adders considered herein. Furthermore, we confirm that the approximation errors caused by the proposed adder have very little impact on output quality when adopted in practical applications, such as digital image processing and machine learning.
- Research Article
14
- 10.1109/tetc.2020.2989699
- Apr 24, 2020
- IEEE Transactions on Emerging Topics in Computing
The squaring function is widely used in Digital Signal Processing (DSP). There are many DSP applications with noisy inputs for which simplifying approximations of the squaring function implementation have a minor impact on the output quality, while permitting significant reductions in the hardware cost. This article proposes a Low-Error Squaring Function (LESF) and its low-power hardware implementation. Unlike the existing logarithmic squaring functions, LESF benefits from a double-sided error distribution and, consequently, error cancellation in larger calculations. LESF approximates a base-2 logarithmic function with a linear polynomial, i.e., <inline-formula><tex-math notation="LaTeX">$\mathrm{log_2}\;f(x) \approx ax+b$</tex-math></inline-formula> . Since input <inline-formula><tex-math notation="LaTeX">$b$</tex-math></inline-formula> in this sum is a constant, LESF replaces the conventional full-adder with a compact specialized adder for hardware efficiency. Our simulation results show that the 16-bit LESF is 23.23 percent more accurate (in the mean relative error distance) than the baseline Mitchell approximate logarithmic squaring function while being 1.8× faster and 39 percent more energy-efficient. LESF and other logarithmic squaring functions are evaluated for the square-law detector application. LESF is shown to be more than 3× more accurate in this application (with respect to the Euclidean distance) than the next most accurate design in the literature, which uses an iterative error compensation technique.