Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy

William R Dieter,Akil Kaveti,Henry G Dietz

doi:10.1109/l-ca.2007.1

Abstract

Some processors designed for consumer applications, such as graphics processing units (CPUs) and the CELL processor, promise outstanding floating-point performance for scientific applications at commodity prices. However, IEEE single precision is the most precise floating-point data type these processors directly support in hardware. Pairs of native floating-point numbers can be used to represent a base result and a residual term to increase accuracy, but the resulting order of magnitude slowdown dramatically reduces the price/performance advantage of these systems. By adding a few simple microarchitectural features, acceptable accuracy can be obtained with relatively little performance penalty. To reduce the cost of native-pair arithmetic, a residual register is used to hold information that would normally have been discarded after each floating-point computation. The residual register dramatically simplifies the code, providing both lower latency and better instruction-level parallelism.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy

Abstract

Talk to us

Similar Papers

More From: IEEE Computer Architecture Letters

Lead the way for us

Journal: IEEE Computer Architecture Letters	Publication Date: Jan 1, 2007
Citations: 28

Similar Papers

Parallel hyperbolic PDE simulation on clusters: Cell versus GPU
Scott Rostrup ... Hans De Sterck
Computer Physics Communications | VOL. 181
Scott Rostrup, et. al.Scott Rostrup ... Hans De Sterck
26 Aug 2010
Computer Physics Communications | VOL. 181

Analyzing GPU-controlled communication with dynamic parallelism in terms of performance and energy
Lena Oden ... Holger Fröning
Parallel Computing | VOL. 57
Lena Oden, et. al.Lena Oden ... Holger Fröning
29 Mar 2016
Parallel Computing | VOL. 57

Multi-tenant virtual GPUs for optimising performance of a financial risk application
Javier Prades ... Federico Silla
Journal of Parallel and Distributed Computing | VOL. 108
Javier Prades, et. al.Javier Prades ... Federico Silla
17 Jun 2016
Journal of Parallel and Distributed Computing | VOL. 108

A Fast Discrete Wavelet Transform Using Hybrid Parallelism on GPUs
Tran Minh Quan ... Won-Ki Jeong
IEEE Transactions on Parallel and Distributed Systems | VOL. 27
Tran Minh Quan, et. al.Tran Minh Quan ... Won-Ki Jeong
01 Nov 2016
IEEE Transactions on Parallel and Distributed Systems | VOL. 27

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Low-Cost Microarchitectural Support for Improved Floating-Point Accuracy

Abstract

Talk to us

Similar Papers

More From: IEEE Computer Architecture Letters