Tree Multiplier Research Articles

Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. Keyword: Approximate computing, low power, mirror adder. Booths multiplier, Wallace tree multiplier I. INTRODUCTION Digital signal processing (DSP) blocks form the backbone of various multimedia applications used in portable devices. Most of these DSP blocks implement image and video processing algorithms, where the ultimate output is either an image or a video for human consumption. Human beings have limited perceptual abilities when interpreting an image or a video. This allows the outputs of these algorithms to be numerically approximate rather than accurate. This relaxation on numerical exactness provides some freedom to carry out imprecise or approximate computation. We can use this freedom to come up with low-power designs at different levels of design abstraction, namely, logic, architecture, and algorithm. The paradigm of approximate computing is specific to select hardware implementations of DSP blocks. It is shown in (1) that an embedded reduced instruction set computing processor consumes 70% of the energy in supplying data and instructions, and 6% of the energy while performing arithmetic only. Therefore, using approximate arithmetic in such a scenario will not provide much energy benefit when considering the complete processor. Programmable processors are designed for general-purpose applications with no application-specific specialization. Therefore, there may not be many applications that will be able to tolerate errors due to approximate computing. This also makes general-purpose processors not suited for using approximate building blocks. This issue has already been discussed in (13). Therefore, in this paper, we consider application-specific integrated circuit implementations of error-resilient applications like image and video compression.

Read full abstract

This article proposes an effective way of implementing a multiply accumulate circuit (MAC) for high-speed floating point arithmetic operations. The real-world applications related to digital signal processing and the like demand high-performance computation with greater accuracy. In general, digital signals are represented as a sequence of signed/unsigned fixed/floating point numbers. The final result of a MAC operation can be computed by feeding the mantissa of the previous MAC result as one of the partial products to a Wallace tree multiplier or Braun multiplier. Thus, the separate accumulation circuit can be avoided by keeping the circuit depth still within the bounds of the Wallace tree multiplier, namely O ( log 2 n ), or Braun multiplier, namely O ( n ). In this article, three kinds of floating point MACs are proposed. The experimental results show 48.54% of improvement in worst path delay achieved by the proposed floating point MAC using a radix-2 Wallace structure compared with a conventional floating point MAC without a pipeline using a 45nm technology library. The same proposed design gives 39.92% of improvement in worst path delay without a pipeline using a radix-4 Braun structure as compared with a conventional design. In this article, a radix-32 Q 32.32 -format-based floating point MAC is proposed using a Wallace tree/Braun multiplier. Also this article discusses the msb prediction problem and its solution in floating point arithmetic that is not available in modern fused multiply-add designs. The performance results show comparisons between the proposed floating point MAC with various floating point MAC designs for radix-2,-4,-8, and -16. The proposed design has lesser depth than a conventional floating point MAC as well as a lower area requirement than other ways of floating point MAC implementation, both with/without a pipeline.

Read full abstract

Tree Multiplier Research Articles

Related Topics

Articles published on Tree Multiplier

Modified low power Wallace Tree multiplier using higher order compressors

Low Power, High Speed and Area Efficient Binary Count Multiplier

Performance improvement in tree multiplier using full swing GDI logic based CLA adder

Performance improvement in tree multiplier using full swing GDI logic based CLA adder

Area Efficient and Fast Combined Binary/Decimal Floating Point Fused Multiply Add Unit

An optimized embedded adder for digital signal processing applications

High Speed Area Efficient 32 Bit Wallace Tree Multiplier

A Low Power Multiplier using a 24-Transistor Latch Adder

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Design and Implementation of Booth Multiplier using Approximate Adders

A high-speed multiplexer-based fine-grain pipelined architecture for digital fuzzy logic controllers

Design of Digital FIR Filter Based on MCMAT for 12 bit ALU using DADDA & WALLACE Tree Multiplier

An Efficient Hardware-Based Higher Radix Floating Point MAC Design

English

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Performance Analysis of Different Multipliers using Square Root Carry Select Adders

Carry-based reduction parallel counter design

Models for characterizing noise based PCMOS circuits

Spintronic Threshold Logic Array (STLA)—A compact, low leakage, non-volatile gate array architecture

Binary Multiplication Using Hybrid MOS and Multi-Gate Single-Electron Transistors

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Tree Multiplier Research Articles

Related Topics

Articles published on Tree Multiplier

Modified low power Wallace Tree multiplier using higher order compressors

Low Power, High Speed and Area Efficient Binary Count Multiplier

Performance improvement in tree multiplier using full swing GDI logic based CLA adder

Performance improvement in tree multiplier using full swing GDI logic based CLA adder

Area Efficient and Fast Combined Binary/Decimal Floating Point Fused Multiply Add Unit

An optimized embedded adder for digital signal processing applications

High Speed Area Efficient 32 Bit Wallace Tree Multiplier

A Low Power Multiplier using a 24-Transistor Latch Adder

FPGA Implementation of a High Speed Multiplier Employing Carry Lookahead Adders in Reduction Phase

Design and Implementation of Booth Multiplier using Approximate Adders

A high-speed multiplexer-based fine-grain pipelined architecture for digital fuzzy logic controllers

Design of Digital FIR Filter Based on MCMAT for 12 bit ALU using DADDA &amp; WALLACE Tree Multiplier

An Efficient Hardware-Based Higher Radix Floating Point MAC Design

English

AREA EFFICIENT DISTRIBUTED ARITHMETIC DISCRETE COSINE TRANSFORM USING MODIFIED WALLACE TREE MULTIPLIER

Performance Analysis of Different Multipliers using Square Root Carry Select Adders

Carry-based reduction parallel counter design

Models for characterizing noise based PCMOS circuits

Spintronic Threshold Logic Array (STLA)—A compact, low leakage, non-volatile gate array architecture

Binary Multiplication Using Hybrid MOS and Multi-Gate Single-Electron Transistors

Design of Digital FIR Filter Based on MCMAT for 12 bit ALU using DADDA & WALLACE Tree Multiplier