Portable LQCD Monte Carlo code using OpenACC

Claudio Bonati,Sebastiano Fabio Schifano,Francesco Negro,Raffaele Tripiccione,Giorgio Silvi,Simone Coscetti,Michele Mesiti,Enrico Calore,Massimo D’Elia,M Della Morte,P Fritzsch,C Pena Ruano,E Gámiz Sánchez

doi:10.1051/epjconf/201817509008

Claudio Bonati, Sebastiano Fabio Schifano + Show 11 more

Open Access

https://doi.org/10.1051/epjconf/201817509008

Copy DOI

Abstract

Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.

Highlights

The use of processors based on multi- and many-core architectures is common practice in High Performance Computing (HPC)
Vector instructions are supported by these cores, with a moderate level of data parallelism: 2 to 4 vector elements are processed by one vector instruction
Lattice QCD simulations is a typical and well known HPC grand challenge, where physics results are strongly limited by available computational resources[3, 4]; over the years, several generations of parallel machines, optimized for LQCD, have been developed[5, 6], while the development of LQCD codes running on many core architectures, in particular Graphics Processor Units (GPU), has seen large efforts in the last decade [7,8,9]

Summary

Introduction

The use of processors based on multi- and many-core architectures is common practice in High Performance Computing (HPC). Vector instructions are supported by these cores, with a moderate level of data parallelism: 2 to 4 vector elements are processed by one vector instruction This architecture is reasonably efficient for many type of regular and non-regular applications and convey a level of performance of the order of hundreds of GigaFlops per processor. Lattice QCD simulations is a typical and well known HPC grand challenge, where physics results are strongly limited by available computational resources[3, 4]; over the years, several generations of parallel machines, optimized for LQCD, have been developed[5, 6], while the development of LQCD codes running on many core architectures, in particular GPUs, has seen large efforts in the last decade [7,8,9]. The migration of our code to OpenMP4, if needed, as soon as compiler support becomes more mature, is expected to be a simple effort

Numerical challenges

Implementation

Performance analysis

Findings

Concluding Remarks

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Portable LQCD Monte Carlo code using OpenACC

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences

Lead the way for us

Journal: EPJ web of conferences	Publication Date: Jan 1, 2018
License type: CC BY 4.0

Similar Papers

Design and optimization of a portable LQCD Monte Carlo code using OpenACC
Claudio Bonati ... Francesco Negro
International Journal of Modern Physics C | VOL. 28
Claudio Bonati, et. al.Claudio Bonati ... Francesco Negro
09 Mar 2017
International Journal of Modern Physics C | VOL. 28

Simple, accurate, and efficient implementation of 1-electron atomic time-dependent Schrödinger equation in spherical coordinates
Serguei Patchkovskii ... H.G Muller
Computer Physics Communications | VOL. 199
Serguei Patchkovskii, et. al.Serguei Patchkovskii ... H.G Muller
31 Oct 2015
Computer Physics Communications | VOL. 199

Automatic Generation of Optimized OpenCL Codes Using OCLoptimizer
Jorge F Fabeiro ... Ramón Doallo
The computer journal | VOL. 58
Jorge F Fabeiro, et. al.Jorge F Fabeiro ... Ramón Doallo
02 Jun 2015
The computer journal | VOL. 58

Phase-based tuning for better utilized performance-asymmetric multicores
Tyler Sondag
-
Tyler SondagTyler Sondag
28 Apr 2012
28 Apr 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Portable LQCD Monte Carlo code using OpenACC

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EPJ web of conferences