Reduced-Area Constant-Coefficient and Multiple-Constant Multipliers for Xilinx FPGAs with 6-Input LUTs

E George Walters

doi:10.3390/electronics6040101

E George Walters

Open Access

PDF Available

https://doi.org/10.3390/electronics6040101

Copy DOI

Export

Save

Cite

Journal: Electronics	Publication Date: Nov 22, 2017
Citations: 8	License type: CC BY 4.0

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Multiplication by a constant is a common operation for many signal, image, and video processing applications that are implemented in field-programmable gate arrays (FPGAs). Constant-coefficient multipliers (KCMs) are often implemented in the logic fabric using lookup tables (LUTs), reserving embedded hard multipliers for general-purpose multiplication. This paper describes a two-operand addition circuit from previous work and shows how it can be used to generate and add pre-computed partial products to implement KCMs. A novel method for pre-computing partial products for KCMs with a negative constant is also presented. These KCMs are then extended to have two to eight coefficients that may be selected by a control signal at runtime to implement time-multiplexed multiple-constant multiplication. Synthesis results show that proposed pipelined KCMs use 27.4% fewer LUTs on average and have a median LUT-delay product that is 12% lower than comparable LogiCORE IP KCMs. Proposed pipelined KCMs with two to eight selectable coefficients use 46% to 70% fewer LUTs than the best LogiCORE IP based alternative and most are faster than using a LogiCORE IP multiplier with a coefficient lookup function. They also outperform the state-of-the-art in the literature, using 22% to 57% fewer slices than the smallest pipelined adder graph (PAG) fusion designs and operate 7% to 30% faster than the fastest PAG fusion designs for the same operand size and number of selectable coefficients. For KCMs and KCMs with selectable coefficients of a given operand size, the placement and routing of LUTs remains the same for all positive and negative constant values, which is advantageous for runtime partial reconfiguration.

Highlights

IntroductionField-programmable gate arrays (FPGAs) are often used for computationally intensive applications such as digital-signal processing (DSP), video and image processing, and artificial neural network (ANN) based applications such as machine learning and artificial intelligence
Field-programmable gate arrays (FPGAs) are often used for computationally intensive applications such as digital-signal processing (DSP), video and image processing, and artificial neural network (ANN) based applications such as machine learning and artificial intelligence.For these applications and others, multiplication is the dominant operation in terms of required resources, delay and power consumption
When n is increased to n = 11, the second row still computes the sum of five partial products, but the first row computes the sum of six partial products, so each LUT6 only computes one bit of the sum

Summary

Introduction

Field-programmable gate arrays (FPGAs) are often used for computationally intensive applications such as digital-signal processing (DSP), video and image processing, and artificial neural network (ANN) based applications such as machine learning and artificial intelligence. For these applications and others, multiplication is the dominant operation in terms of required resources, delay and power consumption. Soft KCMs based on lookup tables (LUTs) in the configurable logic fabric are often used for high-performance designs for several reasons: . Each slice has four 6-input lookup tables (LUT6s) designated A, B, C, and D.

Methods

Results

Discussion

Conclusion