Graphics Processing Units Code Research Articles

Abstract. Lagrangian models are fundamental tools to study atmospheric transport processes and for practical applications such as dispersion modeling for anthropogenic and natural emission sources. However, conducting large-scale Lagrangian transport simulations with millions of air parcels or more can become rather numerically costly. In this study, we assessed the potential of exploiting graphics processing units (GPUs) to accelerate Lagrangian transport simulations. We ported the Massive-Parallel Trajectory Calculations (MPTRAC) model to GPUs using the open accelerator (OpenACC) programming model. The trajectory calculations conducted within the MPTRAC model were fully ported to GPUs, i.e., except for feeding in the meteorological input data and for extracting the particle output data, the code operates entirely on the GPU devices without frequent data transfers between CPU and GPU memory. Model verification, performance analyses, and scaling tests of the Message Passing Interface (MPI) – Open Multi-Processing (OpenMP) – OpenACC hybrid parallelization of MPTRAC were conducted on the Jülich Wizard for European Leadership Science (JUWELS) Booster supercomputer operated by the Jülich Supercomputing Centre, Germany. The JUWELS Booster comprises 3744 NVIDIA A100 Tensor Core GPUs, providing a peak performance of 71.0 PFlop s−1. As of June 2021, it is the most powerful supercomputer in Europe and listed among the most energy-efficient systems internationally. For large-scale simulations comprising 108 particles driven by the European Centre for Medium-Range Weather Forecasts' fifth-generation reanalysis (ERA5), the performance evaluation showed a maximum speed-up of a factor of 16 due to the utilization of GPUs compared to CPU-only runs on the JUWELS Booster. In the large-scale GPU run, about 67 % of the runtime is spent on the physics calculations, conducted on the GPUs. Another 15 % of the runtime is required for file I/O, mostly to read the large ERA5 data set from disk. Meteorological data preprocessing on the CPUs also requires about 15 % of the runtime. Although this study identified potential for further improvements of the GPU code, we consider the MPTRAC model ready for production runs on the JUWELS Booster in its present form. The GPU code provides a much faster time to solution than the CPU code, which is particularly relevant for near-real-time applications of a Lagrangian transport model.

Read full abstract

Elegant is an accelerator physics and particle-beam dynamics code widely used for modeling and design of a variety of high-energy particle accelerators and accelerator-based systems. In this paper we discuss a recently developed version of the code that can take advantage of CUDA-enabled graphics processing units (GPUs) to achieve significantly improved performance for a large class of simulations that are important in practice. The GPU version is largely defined by a framework that simplifies implementations of the fundamental kernel types that are used by Elegant: particle operations, reductions, particle loss, histograms, array convolutions and random number generation. Accelerated performance on the Titan Cray XK-7 supercomputer is approximately 6–10 times better with the GPU than all the CPU cores associated with the same node count. In addition to performance, the maintainability of the GPU-accelerated version of the code was considered a key design objective. Accuracy with respect to the CPU implementation is also a core consideration. Four different methods are used to ensure that the accelerated code faithfully reproduces the CPU results. Program summaryProgram Title: Kernels from the GPU-accelerated ElegantProgram Files doi:http://dx.doi.org/10.17632/jc465zy7p5.1Licensing provisions: MITProgramming language: C/C++/CUDANature of problem: The original design of the Elegant accelerator physics code was implemented on central processing units with message-passing interface parallelization. This implementation is not able to use next-generation multicore systems.Solution method: In this package we develop routines based on the CUDA language extensions to C++ that enable porting the Elegant code to be run on graphics processing units (GPUs). Special consideration is given to algorithms that require collective communication on the GPU.Additional comments including restrictions and unusual features: The full Elegant source code is freely available from Argonne National Laboratory and these distributions include the GPU code in the later releases.

Read full abstract

Graphics Processing Units Code Research Articles

Related Topics

Articles published on Graphics Processing Units Code

Accelerating Lagrangian transport simulations on graphics processing units: performance optimizations of Massive-Parallel Trajectory Calculations (MPTRAC) v2.6

A GPU based accelerated solver for simulation of heat transfer during metal casting process

A Simplified GPU Implementation of the Hybrid Lattice Boltzmann Model for Three-Dimensional High Rayleigh Number Flows

Three-dimensional third-order gas-kinetic scheme on hybrid unstructured meshes for Euler and Navier–Stokes equations

Quantum Mechanics/Molecular Mechanics Simulations on NVIDIA and AMD Graphics Processing Units.

High-performance GPU-accelerated evaluation of electron repulsion integrals

Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs

Three-dimensional discontinuous Galerkin based high-order gas-kinetic scheme and GPU implementation

Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on graphics processing units (GPUs)

An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications

Autotuning GPU code for acceleration of CGH calculation

Faster Self-Consistent Field (SCF) Calculations on GPU Clusters.

Analysis of GPU Computation of Parabolic, Bessel, Wright and Riemann Zeta Functions

GPU-accelerated Monte Carlo simulation of MV-CBCT

High-Performance, Graphics Processing Unit-Accelerated Fock Build Algorithm.

Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms.

Boosting Free-Energy Perturbation Calculations with GPU-Accelerated NAMD.

Toward large-scale simulation of residual stress and distortion in wire and arc additive manufacturing

Implementation and performance analysis of the massively parallel method of characteristics based on GPU

GPU acceleration and performance of the particle-beam-dynamics code Elegant

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Graphics Processing Units Code Research Articles

Related Topics

Articles published on Graphics Processing Units Code

Accelerating Lagrangian transport simulations on graphics processing units: performance optimizations of Massive-Parallel Trajectory Calculations (MPTRAC) v2.6

A GPU based accelerated solver for simulation of heat transfer during metal casting process

A Simplified GPU Implementation of the Hybrid Lattice Boltzmann Model for Three-Dimensional High Rayleigh Number Flows

Three-dimensional third-order gas-kinetic scheme on hybrid unstructured meshes for Euler and Navier–Stokes equations

Quantum Mechanics/Molecular Mechanics Simulations on NVIDIA and AMD Graphics Processing Units.

High-performance GPU-accelerated evaluation of electron repulsion integrals

Esoteric Pull and Esoteric Push: Two Simple In-Place Streaming Schemes for the Lattice Boltzmann Method on GPUs

Three-dimensional discontinuous Galerkin based high-order gas-kinetic scheme and GPU implementation

Massive-Parallel Trajectory Calculations version 2.2 (MPTRAC-2.2): Lagrangian transport simulations on graphics processing units (GPUs)

An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications

Autotuning GPU code for acceleration of CGH calculation

Faster Self-Consistent Field (SCF) Calculations on GPU Clusters.

Analysis of GPU Computation of Parabolic, Bessel, Wright and Riemann Zeta Functions

GPU-accelerated Monte Carlo simulation of MV-CBCT

High-Performance, Graphics Processing Unit-Accelerated Fock Build Algorithm.

Evaluation of Clustering Algorithms on GPU-Based Edge Computing Platforms.

Boosting Free-Energy Perturbation Calculations with GPU-Accelerated NAMD.

Toward large-scale simulation of residual stress and distortion in wire and arc additive manufacturing

Implementation and performance analysis of the massively parallel method of characteristics based on GPU

GPU acceleration and performance of the particle-beam-dynamics code Elegant