A Study of the Effects and Benefits of Custom-Precision Mathematical Libraries for HPC Codes

Emeric Brun,Eric Petit,Pablo De Oliveira Castro,Davide Mancusi,Alan Vaquet,David Defour,Matei Istoan

doi:10.1109/tetc.2021.3070422

Abstract

Mathematical libraries are typically developed for use with the fixed-width data-paths on processors and target common floating-point formats such as IEEE binary32 and binary64 . To address the increasing energy consumption and throughput requirements of HPC, scientific computing and AI applications, libraries and hardware implementations now provide new floating-point formats, allowing mathematical function evaluations with different performance and accuracy trade-offs. In this article we present a methodology and its associated proof-of-concept tool to evaluate the benefits of custom accuracy of mathematical library calls in HPC and scientific computations. First, our tool collects for each call-site of a mathematical function the input- and output-data profile. Then, using a heuristic exploration algorithm, we estimate the minimal required accuracy by rounding the result to lower precisions. The data profile and accuracy measurement per call-site is used to speculatively select the mathematical function implementation with the most appropriate accuracy for a given scenario. We have tested the methodology with the Intel MKL Vector Math ( VM ) library, leveraging the predefined accuracy levels. We demonstrate the benefits of our approach on two real-world applications: SGP4 , a satellite tracking application, and PATMOS , a Monte Carlo neutron transport code. The robustness of the methodology is estimated by measuring the numerical accuracy of the resulting optimized code, against user-defined criteria. We experiment and discuss generalization across data-sets and finally propose a speculative runtime implementation for PATMOS . The experiment provides an insight into the performance improvements achievable by leveraging the control of per-function call-site accuracy-mode execution of the Intel compiler SVML library. We show benefits from 13 to 55 percent in time reduction for the PATMOS use case.

Full Text