Exact Versus Inexact Decimal Floating-Point Numbers and Arithmetic

Muhamed F Mudawar

doi:10.1109/access.2023.3244891

Abstract

The IEEE 754 standard does not distinguish between exact and inexact floating-point numbers. There is no bit or field in the binary encoding that indicates whether a floating-point number is exact or not. This is the case for binary and decimal floats. An inexact operation raises an inexact flag in a floating-point status register. The inexact result is rounded and used in a later operation as if it were exact. The floating-point arithmetic unit treats all the input operands as if there were exact, and hence might produce substantial errors in the final computed results. This paper focuses on making the distinction between exact and inexact decimal numbers and defines arithmetic operations on both types of numbers. If the result of a sequence of operations is exact, the user can trust that every decimal digit in the computed result is correct. On the other hand, if some input operands are inexact or the result cannot be computed exactly, a loss of significant digits occurs. A different representation is used for the inexact computed value. An estimate of the absolute error is also part of the inexact computed result. The decimal numbers and arithmetic operations introduced in this paper produce more accurate results that those computed by the IEEE 754 standard. A simple evaluation is shown in the last section of this paper.

Full Text