Performance and energy consumption of the SIMD Gram–Schmidt process for vector orthogonalization

Thomas Jakobs,Billy Naumann,Gudula Rünger

doi:10.1007/s11227-019-02839-0

Abstract

In linear algebra and numerical computing, the orthogonalization of a set of vectors is an important submethod. Thus, the efficient implementation on recent architectures is required to provide a useful kernel for high-performance applications. In this article, we consider the process of orthogonalizing a set of vectors with the Gram–Schmidt method and develop SIMD implementations for processors providing the Advanced Vector Extensions (AVX), which is a set of instructions for SIMD execution on recent Intel and AMD CPUs. Several SIMD implementations of the Gram–Schmidt process for vector orthogonalization are built, and their behavior with respect to performance and energy is investigated. Especially, different ways to implement the SIMD programs are proposed and several optimizations have been studied. As hardware platforms, the Intel Core, Xeon and Xeon Phi processors with the AVX versions AVX, AVX2 and AVX512 have been used.

Full Text