Optimizations of the eigensolvers in the ELPA library

P Kůs,A Marek,S.S Köcher,H.-H Kowalski,C Carbogno,Ch Scheurer,K Reuter,M Scheffler,H Lederer

doi:10.1016/j.parco.2019.04.003

Abstract

The solution of (generalized) eigenvalue problems for symmetric or Hermitian matrices is a common subtask of many numerical calculations in electronic structure theory or materials science. Depending on the scientific problem, solving the eigenvalue problem can easily amount to a sizeable fraction of the whole numerical calculation, and quite often is even the dominant part by far. For researchers in the field of computational materials science, an efficient and scalable solution of the eigenvalue problem is thus of major importance. The ELPA-library (Eigenvalue SoLvers for Petaflop-Applications) is a well-established dense direct eigenvalue solver library, which has proven to be very efficient and scalable up to very large core counts. It is in a wide-spread production use on a large variety of HPC systems worldwide, and is applied by many codes in the field of materials science. In this paper, we describe the latest optimizations of the ELPA-library for new HPC architectures of the Intel Skylake processor family with an AVX-512 SIMD instruction set, or for HPC systems accelerated with recent GPUs. Apart from those direct hardware-related optimizations, we also describe a complete redesign of the API in a modern modular way, which, apart from a much simpler and more flexible usability, leads to a new path to access system-specific performance optimizations. In order to ensure optimal performance for a particular scientific setting or a specific HPC system, the new API allows the user to influence in a straightforward way the internal details of the algorithms and of performance-critical parameters used in the ELPA-library. On top of that, we introduce an autotuning functionality, which allows for finding the best settings in a self-contained automated way, without the need of much user effort. In situations where many eigenvalue problems with similar settings have to be solved consecutively, the autotuning process of the ELPA-library can be done “on-the-fly”, without the need of preceding the simulation with an “artificial” autotuning step. Practical applications from materials science which rely on reaching a numerical convergence limit by so-called self-consistency iterations can profit from the on-the-fly autotuning. On some examples of scientific interest, simulated with the FHI-aims application, the advantages of the latest optimizations of the ELPA-library are demonstrated.

Highlights

When developing and maintaining a library for HPC applications the developers generally have to make a difficult decision: on the one hand they can decide to develop a specialized library which shows best performance on a certain HPC hardware, on the other, they can choose to develop a general library which supports a huge variety of HPC systems, albeit, as a consequence, the performance tuning becomes much more complex
In the HPC community this led to the fact that there are on the one hand general, standard libraries, like the famous BLAS, LAPACK, and ScaLAPACK [9, 8, 7], which are open-source and can be compiled and used on every system
This paper is organized as follows: After recapitulating the mathematical background of the solution to a eigenvalue problem in Section 2, we describe in Section 3 the latest optimization for the Intel Xeon Skylake and NVIDIA GPU architectures and show some performance results

Summary

Introduction

When developing and maintaining a library for HPC applications the developers generally have to make a difficult decision: on the one hand they can decide to develop a specialized library which shows best performance on a certain HPC hardware, on the other, they can choose to develop a general library which supports a huge variety of HPC systems, albeit, as a consequence, the performance tuning becomes much more complex The reason for this choice is mandated by the extreme variety of available HPC systems: it is an almost impossible endeavour to optimize a library for each available processor from different manufacturers (or even different processors from the same manufacturer) each with its own characteristics of CPU frequency, cache hierarchy and cache sizes, SIMD instructions set, only to mention a few.

The eigenvalue problem solved by ELPA

Intel Xeon Skylake optimizations

GPU-related optimizations

Redesign of the ELPA library

Autotuning

Applications in quantum mechanics

Introduction to electronic structure computations

Performance benefits by autotuning

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Parallel computing	Publication Date: Apr 16, 2019
Citations: 21	License type: cc-by

R Discovery Prime

R Discovery Prime

Optimizations of the eigensolvers in the ELPA library

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Parallel computing

Lead the way for us

Similar Papers

The ELPA library: scalable parallel eigenvalue solutions for electronic structure theory and computational science
A Marek ... H-J Bungartz
Journal of physics. Condensed matter : an Institute of Physics journal | VOL. 26
A Marek, et. al.A Marek ... H-J Bungartz
02 May 2014
Journal of physics. Condensed matter : an Institute of Physics journal | VOL. 26

Explicit Solution of the Inverse Eigenvalue Problem of Real Symmetric Matrices and Its Application to Electrical Network Synthesis
...
Mathematical Problems in Engineering | VOL. 2008
, et. al. ...
01 Jan 2008
Mathematical Problems in Engineering | VOL. 2008

A uniform object-oriented solution to the eigenvalue problem for real symmetric and Hermitian matrices
María Eugenia Castro ... Alfonso Niño
Computer Physics Communications | VOL. 182
María Eugenia Castro, et. al.María Eugenia Castro ... Alfonso Niño
25 Nov 2010
Computer Physics Communications | VOL. 182

EIGENVALUE PROBLEMS FOR VIBRATING STRUCTURES COUPLED WITH QUIESCENT FLUIDS WITH FREE SURFACE
M Amabili
Journal of Sound and Vibration | VOL. 231
M AmabiliM Amabili
01 Mar 2000
Journal of Sound and Vibration | VOL. 231

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Optimizations of the eigensolvers in the ELPA library

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Parallel computing