Low-Latency Bit-Accurate Architecture for Configurable Precision Floating-Point Division

Jincheng Xia,Wenjia Fu,Ming Liu,Mingjiang Wang

doi:10.3390/app11114988

Abstract

Floating-point division is indispensable and becoming increasingly important in many modern applications. To improve speed performance of floating-point division in actual microprocessors, this paper proposes a low-latency architecture with a multi-precision architecture for floating-point division which will meet the IEEE-754 standard. There are three parts in the floating-point division design: pre-configuration, mantissa division, and quotient normalization. In the part of mantissa division, based on the fast division algorithm, a Predict–Correct algorithm is employed which brings about more partial quotient bits per cycle without consuming too much circuit area. Detailed analysis is presented to support the guaranteed accuracy per cycle with no restriction to specific parameters. In the synthesis using TSMC, 90 nm standard cell library, the results show that the proposed architecture has ≈63.6% latency, ≈30.23% total time (latency × period), ≈31.8% total energy (power × latency × period), and ≈44.6% efficient average energy (power × latency × period/efficient length) overhead over the latest floating-point division structure. In terms of latency, the proposed division architecture is much faster than several classic processors.

Highlights

Our research focuses on the architecture design of configurable precision FP arithmetic units
Fast division requires hardware with at least one look-up table of size 2m −1 × m bits and three multipliers, a carrying assimilation multiplier of size (m + 1) × n for the divisor’s initial multiplications and a carry-save multiplier of size (m + 1) × m for the quotient
Division to multiplicative iterations rather than subtractive iterations [42], pre-scaling operands [43,44,45], using Fourier division [46,47], using alInspired by fast division method [18], this paper proposes a Predict–Correct algorithm ternative digit codes such as binary-coded decimal (BCD) digits instead of decimal and which will increase iteration speed by bringing about n more quotient bits than fast division basic binary digits [48], cascading multiple stages of lower radix dividers [49], overlapwithout consuming many areas

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Literature exists describing division algorithms, of which digit recurrence, functional iteration, variable latency, very high radix, and look-up table are five typical division implementations [3]. The Digital Equipment Corporation (DEC) Alpha 21164 [16] is one of the best examples of variable latency class algorithm implementation It is found in [17] that the average number of quotient bits retired in one iteration varies from 2 to 3 depending on the stream of bits in the partial remainder. In [20], the main difference between SRT and very high-radix algorithm is that it has a more complex divisor multiple processing and quotient-digit selection hardware, which increases the cycle time and area. The proposed architecture is based on very high-radix algorithm [18], which can work out much more than 10-bit quotient in one clock cycle.

Background

Predict–Correct Algorithm with Accurate Quotient Approximation

Guaranteed Bits per Cycle Using Predict–Correct Algorithm

General Architecture and Main Parts

Part 1 PRECONFIG

Part 3 NORMALIZE

Design

Functional Verification

Related Work and Comparisons

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied sciences	Publication Date: May 28, 2021
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Low-Latency Bit-Accurate Architecture for Configurable Precision Floating-Point Division

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences

Lead the way for us

Similar Papers

Design Issues and Implementations for Floating-Point Divide–Add Fused
Alexandru Amaricai ... Oana Boncalo
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing | VOL. 57
Alexandru Amaricai, et. al.Alexandru Amaricai ... Oana Boncalo
01 Apr 2010
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing | VOL. 57

Design and Implementation of Floating Point Divide-Add Fused Architecture
Kuldeep Pande ... Shashant Jaykar
-
Kuldeep Pande, et. al.Kuldeep Pande ... Shashant Jaykar
01 Apr 2015
01 Apr 2015

A pipelined architecture for user-defined floating-point complex division on FPGA
Shaobing Huang ... Fang-Jian Han
-
Shaobing Huang, et. al.Shaobing Huang ... Fang-Jian Han
01 Apr 2017
01 Apr 2017

Taylor Series Based Architecture for Quadruple Precision Floating Point Division
Manish Kumar Jaiswal ... Hayden K.-H So
-
Manish Kumar Jaiswal, et. al.Manish Kumar Jaiswal ... Hayden K.-H So
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Low-Latency Bit-Accurate Architecture for Configurable Precision Floating-Point Division

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied sciences