Customizing Number Representation and Precision

Olivier Sentieys,Daniel Ménard

doi:10.1007/978-3-030-94705-7_2

Abstract

AbstractThere is a growing interest in the use of reduced-precision arithmetic, exacerbated by the recent interest in artificial intelligence, especially with deep learning. Most architectures already provide reduced-precision capabilities (e.g., 8-bit integer, 16-bit floating-point). In the context of FPGAs, any number format and bit-width can even be considered. In computer arithmetic, the representation of real numbers is a major issue. Fixed-point (FxP) and floating-point (FlP) are the main options to represent reals, both with their advantages and drawbacks. This chapter presents both FxP and FlP number representations and draws a fair comparison between their cost, performance, and energy, as well as their impact on accuracy during computations. It is shown that the choice between FxP and FlP is not obvious and strongly depends on the application considered. In some cases, low-precision floating-point arithmetic can be the most effective and provides some benefits over the classical fixed-point choice for energy-constrained applications.

Full Text