Using Ginkgo's memory accessor for improving the accuracy of memory‐bound low precision BLAS

Thomas Grützmacher,Hartwig Anzt,Enrique S Quintana‐Ortí

doi:10.1002/spe.3041

Abstract

AbstractThe roofline model not only provides a powerful tool to relate an application's performance with the specific constraints imposed by the target hardware but also offers a graphic representation of the balance between memory access cost and compute throughput. In this work, we present a strategy to break up the tight coupling between the precision format used for arithmetic operations and the storage format employed for memory operations. (At a high level, this idea is equivalent to compressing/decompressing the data in registers before/after invoking store/load memory operations.) In practice, we demonstrate that a “memory accessor” that hides the data compression behind the memory access, can virtually push the bandwidth‐induced roofline, yielding higher performance for memory‐bound applications using high precision arithmetic that can handle the numerical effects associated with lossy compression. We also demonstrate that memory‐bound applications operating on low precision data can increase the accuracy by relying on the memory accessor to perform all arithmetic operations in high precision. In particular, we demonstrate that memory‐bound BLAS operations (including the sparse matrix‐vector product) can be re‐engineered with the memory accessor and that the resulting accessor‐enabled BLAS routines achieve lower rounding errors while delivering the same performance as the fast low precision BLAS.

Full Text