REEL: Reducing effective execution latency of floating point operations

Vignyan Reddy,Mikko H Lipasti,Erika Gunadi,Syed Zohaib Gilani,Michael J Schulte,Nam Sung Kim

doi:10.1109/islped.2013.6629292

Abstract

The height of the dynamic dependence graph of a program, as executed by a processor, determines the minimum bound on the execution time. This height can be decreased by reducing the effective execution latency of operations that form dependence chains in the graph. In this paper, we propose a technique called REEL to reduce overall latency of chains of dependent floating point (FP) operations by increasing the throughput of computation. REEL comprises of a high-throughput floating point unit (HFP) that allows early issue of an FP Add that is dependent on another FP Add or FP Multiply. This is complemented by instruction scheduler modifications that allow early issue of dependent FP Adds, and a novel checker logic that corrects any precision errors. Unlike conventional static operation fusion, like fused Multiply-Add (FMA), there are no changes to the instruction set to enable utilization of the new hardware, and no recompilation is necessary. Furthermore, unlike ISA-level FMA, our technique produces results that are bit compatible while boosting performance of Add-Add dependence pairs in addition to Multiply-Add pairs. Our evaluation of REEL using CFP2006 benchmarks shows an average performance gain of 7.6% and maximum performance gain of 17% while consuming 1.2% lower energy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

REEL: Reducing effective execution latency of floating point operations

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

REEL: reducing effective execution latency of floating point operations
...
-
, et. al. ...
04 Sep 2013
04 Sep 2013

Quadruple-precision BLAS using Bailey's arithmetic with FMA instruction: its performance and applications
Susumu Yamada ... Toshiyuki Imamura
-
Susumu Yamada, et. al.Susumu Yamada ... Toshiyuki Imamura
01 May 2017
01 May 2017

Estimating numerical error in neural network simulations on Graphics Processing Units
James P Turner ... Thomas Nowotny
BMC Neuroscience | VOL. 16
James P Turner, et. al.James P Turner ... Thomas Nowotny
01 Dec 2015
BMC Neuroscience | VOL. 16

DLX gold: design and implementation of a DLX microprocessor with single precision floating- point operations
John Edrian H Aguilar ... Rosario M Reas
-
John Edrian H Aguilar, et. al.John Edrian H Aguilar ... Rosario M Reas
01 Oct 2007
01 Oct 2007

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

REEL: Reducing effective execution latency of floating point operations

Abstract

Talk to us

Similar Papers