Self-Adaptive Run-Time Variable Floating-Point Precision for Iterative Algorithms: A Joint HW/SW Approach

Noureddine Ait Said,Mounir Benabdenbi,Katell Morin-Allory

doi:10.3390/electronics10182209

Abstract

Using standard Floating-Point (FP) formats for computation leads to significant hardware overhead since these formats are over-designed for error-resilient workloads such as iterative algorithms. Hence, hardware FP Unit (FPU) architectures need run-time variable precision capabilities. In this work, we propose a new method and an FPU architecture that enable designers to dynamically tune FP computations’ precision automatically at run-time called Variable Precision in Time (VPT), leading to significant power consumption, execution time, and energy savings. In spite of its circuit area overhead, the proposed approach simplifies the integration of variable precision in existing software workloads at any level of the software stack (OS, RTOS, or application-level): it only requires lightweight software support and solely relies on traditional assembly instructions, without the need for a specialized compiler or custom instructions. We apply the technique on the Jacobi and the Gauss–Seidel iterative methods taking full advantage of the suggested FPU. For each algorithm, two modified versions are proposed: a conservative version and a relaxed one. Both algorithms are analyzed and compared statistically to understand the effects of VPT on iterative applications. The implementations demonstrate up to 70.67% power consumption saving, up to 59.80% execution time saving, and up to 88.20% total energy saving w.r.t the reference double precision implementation, and with no accuracy loss.

Highlights

Many industrial applications have emerged in domains such as the Internet of Things (IoT), Artificial Intelligence (AI), Neural Networks (NNs), etc. with a common characteristic: inherent error-resilience
We focus on CPU architectures and precisely on the 64-bit RISC-V [35] Instruction Set Architecture (ISA), with single-precision (F) and double precision (D) extensions
After assessing the effectiveness of the proposed approach from a statistical and software point of view, this section presents a hardware-level evaluation of the power, execution time, and energy savings related to computations occurring inside the FP Unit (FPU)

Summary

Introduction

Many industrial applications have emerged in domains such as the Internet of Things (IoT), Artificial Intelligence (AI), Neural Networks (NNs), etc. with a common characteristic: inherent error-resilience. HW/SW designers could trade the precision of computations against cost, resource, and power savings for such a class of applications. An FPU is usually responsible for an extensive amount of power consumption and high memory bandwidth. The energy consumption associated with FP arithmetic is known to be higher than that of its integer counterpart [2], making FPU optimization a priority. A binary floating-point number can be written in the form (−1)s (1 + m)2e , where s is the sign bit, m is the mantissa ( called significand or fraction), and e is the exponent. A Floating-Point (FP) format is defined by the pair ( E, M ), where E is the bitwidth of its exponent and M is the bit-width of its mantissa field.

Methods

Results

Discussion

Conclusion