Abstract

The solution of (generalized) eigenvalue problems for symmetric or Hermitian matrices is a common subtask of many numerical calculations in electronic structure theory or materials science. Depending on the scientific problem, solving the eigenvalue problem can easily amount to a sizeable fraction of the whole numerical calculation, and quite often is even the dominant part by far. For researchers in the field of computational materials science, an efficient and scalable solution of the eigenvalue problem is thus of major importance. The ELPA-library (Eigenvalue SoLvers for Petaflop-Applications) is a well-established dense direct eigenvalue solver library, which has proven to be very efficient and scalable up to very large core counts. It is in a wide-spread production use on a large variety of HPC systems worldwide, and is applied by many codes in the field of materials science. In this paper, we describe the latest optimizations of the ELPA-library for new HPC architectures of the Intel Skylake processor family with an AVX-512 SIMD instruction set, or for HPC systems accelerated with recent GPUs. Apart from those direct hardware-related optimizations, we also describe a complete redesign of the API in a modern modular way, which, apart from a much simpler and more flexible usability, leads to a new path to access system-specific performance optimizations. In order to ensure optimal performance for a particular scientific setting or a specific HPC system, the new API allows the user to influence in a straightforward way the internal details of the algorithms and of performance-critical parameters used in the ELPA-library. On top of that, we introduce an autotuning functionality, which allows for finding the best settings in a self-contained automated way, without the need of much user effort. In situations where many eigenvalue problems with similar settings have to be solved consecutively, the autotuning process of the ELPA-library can be done “on-the-fly”, without the need of preceding the simulation with an “artificial” autotuning step. Practical applications from materials science which rely on reaching a numerical convergence limit by so-called self-consistency iterations can profit from the on-the-fly autotuning. On some examples of scientific interest, simulated with the FHI-aims application, the advantages of the latest optimizations of the ELPA-library are demonstrated.

Highlights

  • When developing and maintaining a library for HPC applications the developers generally have to make a difficult decision: on the one hand they can decide to develop a specialized library which shows best performance on a certain HPC hardware, on the other, they can choose to develop a general library which supports a huge variety of HPC systems, albeit, as a consequence, the performance tuning becomes much more complex

  • In the HPC community this led to the fact that there are on the one hand general, standard libraries, like the famous BLAS, LAPACK, and ScaLAPACK [9, 8, 7], which are open-source and can be compiled and used on every system

  • This paper is organized as follows: After recapitulating the mathematical background of the solution to a eigenvalue problem in Section 2, we describe in Section 3 the latest optimization for the Intel Xeon Skylake and NVIDIA GPU architectures and show some performance results

Read more

Summary

Introduction

When developing and maintaining a library for HPC applications the developers generally have to make a difficult decision: on the one hand they can decide to develop a specialized library which shows best performance on a certain HPC hardware, on the other, they can choose to develop a general library which supports a huge variety of HPC systems, albeit, as a consequence, the performance tuning becomes much more complex The reason for this choice is mandated by the extreme variety of available HPC systems: it is an almost impossible endeavour to optimize a library for each available processor from different manufacturers (or even different processors from the same manufacturer) each with its own characteristics of CPU frequency, cache hierarchy and cache sizes, SIMD instructions set, only to mention a few.

The eigenvalue problem solved by ELPA
Intel Xeon Skylake optimizations
GPU-related optimizations
Redesign of the ELPA library
Autotuning
Applications in quantum mechanics
Introduction to electronic structure computations
Performance benefits by autotuning
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call