Abstract

Varying from multi-core CPU processors to many-core GPUs, the present scenario of HPC architectures is extremely heterogeneous. In this context, code portability is increasingly important for easy maintainability of applications; this is relevant in scientific computing where code changes are numerous and frequent. In this talk we present the design and optimization of a state-of-the-art production level LQCD Monte Carlo application, using the OpenACC directives model. OpenACC aims to abstract parallel programming to a descriptive level, where programmers do not need to specify the mapping of the code on the target machine. We describe the OpenACC implementation and show that the same code is able to target different architectures, including state-of-the-art CPUs and GPUs.

Highlights

  • The use of processors based on multi- and many-core architectures is common practice in High Performance Computing (HPC)

  • Vector instructions are supported by these cores, with a moderate level of data parallelism: 2 to 4 vector elements are processed by one vector instruction

  • Lattice QCD simulations is a typical and well known HPC grand challenge, where physics results are strongly limited by available computational resources[3, 4]; over the years, several generations of parallel machines, optimized for LQCD, have been developed[5, 6], while the development of LQCD codes running on many core architectures, in particular Graphics Processor Units (GPU), has seen large efforts in the last decade [7,8,9]

Read more

Summary

Introduction

The use of processors based on multi- and many-core architectures is common practice in High Performance Computing (HPC). Vector instructions are supported by these cores, with a moderate level of data parallelism: 2 to 4 vector elements are processed by one vector instruction This architecture is reasonably efficient for many type of regular and non-regular applications and convey a level of performance of the order of hundreds of GigaFlops per processor. Lattice QCD simulations is a typical and well known HPC grand challenge, where physics results are strongly limited by available computational resources[3, 4]; over the years, several generations of parallel machines, optimized for LQCD, have been developed[5, 6], while the development of LQCD codes running on many core architectures, in particular GPUs, has seen large efforts in the last decade [7,8,9]. The migration of our code to OpenMP4, if needed, as soon as compiler support becomes more mature, is expected to be a simple effort

Numerical challenges
Implementation
Performance analysis
Findings
Concluding Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call