Abstract

Abstract We describe algorithms implemented in FDPS (Framework for Developing Particle Simulators) to make efficient use of accelerator hardware such as GPGPUs (general-purpose computing on graphics processing units). We have developed FDPS to make it possible for researchers to develop their own high-performance parallel particle-based simulation programs without spending large amounts of time on parallelization and performance tuning. FDPS provides a high-performance implementation of parallel algorithms for particle-based simulations in a “generic” form, so that researchers can define their own particle data structure and interparticle interaction functions. FDPS compiled with user-supplied data types and interaction functions provides all the necessary functions for parallelization, and researchers can thus write their programs as though they are writing simple non-parallel code. It has previously been possible to use accelerators with FDPS by writing an interaction function that uses the accelerator. However, the efficiency was limited by the latency and bandwidth of communication between the CPU and the accelerator, and also by the mismatch between the available degree of parallelism of the interaction function and that of the hardware parallelism. We have modified the interface of the user-provided interaction functions so that accelerators are more efficiently used. We also implemented new techniques which reduce the amount of work on the CPU side and the amount of communication between CPU and accelerators. We have measured the performance of N-body simulations on a system with an NVIDIA Volta GPGPU using FDPS and the achieved performance is around 27% of the theoretical peak limit. We have constructed a detailed performance model, and found that the current implementation can achieve good performance on systems with much smaller memory and communication bandwidth. Thus, our implementation will be applicable to future generations of accelerator system.

Highlights

  • In this paper we describe new algorithms implemented in FDPS (Framework for Developing Particle Simulators: Iwasawa et al 2016; Namekata et al 2018), to make efficient use of accelerators such as GPGPUs

  • The main cause of this problem is that modern highperformance computing (HPC) platforms have become very complex, requiring a lot of effort to develop complex programs to make efficient use of such platforms

  • The GPGPU performs the calculations for multiple interaction lists in parallel, and this goal, we have designed FDPS so that it provides all necessary functions for efficient parallel programming of particle-based simulations

Read more

Summary

Introduction

In this paper we describe new algorithms implemented in FDPS (Framework for Developing Particle Simulators: Iwasawa et al 2016; Namekata et al 2018), to make efficient use of accelerators such as GPGPUs (general-purpose computing on graphics processing units). To develop efficient parallel programs for particle-based simulations requires a very large amount of work, comparable with the work of a large team of people for many years. Just to write and debug such a program is difficult, and it has become nearly impossible for any single person or even for a small group of people to develop large-scale simulation programs which run efficiently on modern HPC systems. This extremely large number of nodes is just one of the many difficulties of using modern HPC systems, since even within one node there are many levels of parallelism to be taken care of by the programmer.

Overview of FDPS
Traditional approach to using accelerators and its limitation
New algorithms
Indirect addressing of particles
Reuse of interaction Lists
Procedures with or without the new algorithms
APIs for using accelerators
Method
Performance model on a single node
Model of Tconstlt
Model of Troot
Model of Tconst gt
Model of Treorder gt
10 Tflops
Performance model on multiple nodes
Discussion and summary
Tree of domains
Findings
Procedure
Further improvement in single-node performance
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call