High-Precision Anchored Accumulators for Reproducible Floating-Point Summation

Neil Burgess,David R Lutz,Chris Goodyer,Christopher N Hinds

doi:10.1109/tc.2018.2855729

Abstract

This paper introduces a new datatype, the High-Precision Anchored (HPA) number, that allows reproducible accumulation of floating-point (FP) numbers in a programmer-selectable range. The new datatype has a larger significand and a smaller range than existing FP formats and has much better arithmetic and computational properties. In particular, it is associative, parallelizable, reproducible and correct. The paper also describes how HPA processing can be implemented as part of Arm's new Scalable Vector Extension (SVE) together with proposals for new instructions aimed specifically at the new datatype. For the modest ranges that will accommodate most problems, HPA processing is much faster than FP arithmetic: performance modelling shows 2-lane HPA accumulation of FP64 operands is 9.5 times faster on Arm's new vector architecture than double double accumulation and accelerates a recently published software algorithm for 3-lane reproducible FP summation by a factor of 5.6. This paper also discusses instruction-level optimizations for FP32 and FP16 summations that further increase HPA performance relative to FP64 accumulations.

Full Text