Low Latency Floating-Point Division and Square Root Unit

Javier D Bruguera

doi:10.1109/tc.2019.2947899

Abstract

Digit-recurrence algorithms are widely used in actual microprocessors to compute floating-point division and square root. These iterative algorithms present a good trade-off in terms of performance, area and power. We present a floating-point division and square root unit, which implements a radix-64 floating-point division and a radix-16 floating-point square root. To have an affordable implementation, each radix-64 division iteration and radix-16 square root iteration are made of simpler radix-4 iterations: 3 radix-4 iterations in division and 2 in square root. Speculation is used between consecutive radix-4 iterations to get a reduced timing. There are three different parts in digit-recurrence implementations: initialization, digit iterations, and rounding. The digit iteration is the iterative part and it uses the same logic for several cycles. Division and square root share partially the initialization and rounding stages, whereas each one has different logic for the digit iterations. The result is a low-latency floating-point divider and square root, requiring 11, 6, and 4 cycles for double, single and half-precision division with normalized operands and result, and 15, 8 and 5 cycles for square root. One or two additional cycles are needed in case of subnormal operand(s) or result.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Low Latency Floating-Point Division and Square Root Unit

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers

Lead the way for us

Journal: IEEE Transactions on Computers	Publication Date: Feb 1, 2020
Citations: 38

Similar Papers

Tradeoffs of designing floating-point division and square root on Virtex FPGAs
Xiaojun Wang ... B.E Nelson
-
Xiaojun Wang, et. al. Xiaojun Wang ... B.E Nelson
09 Apr 2003
09 Apr 2003

Radix-64 Floating-Point Division and Square Root: Iterative and Pipelined Units
Javier D Bruguera
IEEE Transactions on Computers | VOL. 72
Javier D BrugueraJavier D Bruguera
01 Oct 2023
IEEE Transactions on Computers | VOL. 72

Design And Optimization Of Floating Point Division And Square Root Using Minimal Device Latency
K Hema Priya ... S Praveen Kumar
IOP Conference Series: Materials Science and Engineering | VOL. 1084
K Hema Priya, et. al.K Hema Priya ... S Praveen Kumar
01 Mar 2021
IOP Conference Series: Materials Science and Engineering | VOL. 1084

Radix-64 Floating-Point Divider
Javier D Bruguera
-
Javier D BrugueraJavier D Bruguera
01 Jun 2018
01 Jun 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Low Latency Floating-Point Division and Square Root Unit

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Computers