GPU Cluster Research Articles

Laser plasma instabilities (LPIs) have significant influences on the laser energy deposition efficiency and therefore are important processes in inertial confined fusion (ICF). Numerical simulations play important roles in revealing the complex physics of LPIs. Since LPIs are typically a three wave coupling process, the precise simulations of LPIs with kinetic effects require to resolve the laser period (around one femtosecond) and laser wavelength (less than one micron). In the typical ICF experiments, however, LPIs are involved in a spatial scale of several millimeters and a temporal scale of several nanoseconds. Therefore, the precise kinetic simulations of LPIs in such scales require huge computational resources and are hard to be carried out by present kinetic codes like particle-in-cell (PIC) codes. In this paper, a full wave fluid model of LPIs is constructed and numerically solved by the particle-mesh method, where the plasma is described by macro particles that can move across the mesh grids freely. Based upon this model, a two-dimensional (2D) GPU code named PM2D is developed. The PM2D code can simulate the kinetic effects of LPIs self-consistently as normal PIC codes. Moreover, as the physical model adopted in the PM2D code is specifically constructed for LPIs, the required macro particles per grid in the simulations can be largely reduced and thus overall simulation cost is considerably reduced comparing with PIC codes. More importantly, the numerical noise in the PM2D code is much lower, which makes it more robust than PIC codes in the simulation of LPIs for the long-time scale above 10 picoseconds. After the distributed computing is realized, our PM2D code is able to run on GPU clusters with a total mesh grids up to several billions, which meets the requirements for the simulations of LPIs at ICF experimental scale with reasonable cost. Program summaryProgram Title: PM2DCPC Library link to program files:https://doi.org/10.17632/xscj6vnkkw.1Licensing provisions: GNU General Public License v3.0.Programming language: C++, CUDA.Nature of problem: Although the large scale simulations of laser plasma instabilities (LPIs) is of great significance for the inertial confinement fusion (ICF), there is still no suitable code to simulate these problems. PM2D code based on a GPU platform provides an effective method to simulate these large scale problems in ICF.Solution method: A fluid model for LPIs is established firstly, which contains wave equations that describe the laser propagating process, electron and ion fluid equations that describe the plasma motions, and a Poisson's equation that describes the electrostatic field induced by charge separation. The wave equation is solved on a rectangular region using absorption boundary conditions on all of four boundaries. The absorption boundary condition on the left boundary is further extended to allow the incidence of driven lasers and absorption of scattering lasers simultaneously. The fluid equations in the physical model are solved by the particle-mesh method, in which the macro particles are driven to move by fluid forces. Since macro particles can move freely within the fixed fluid grids, the PM2D code can capture the kinetic effects self-consistently. The Poisson's equation for the electrostatic field is solved by a Fourier decomposition method in the y direction, which helps to decrease the simulation cost greatly. The PM2D code is developed on a GPU platform base on CUDA toolkit, which largely increase the computational speed.

Read full abstract

Deep learning (DL) has become a key component of modern software. In the “ big model ” era, the rich features of DL-based software (i.e., DL software) substantially rely on powerful DL models, e.g., BERT, GPT-3, and the recently emerging GPT-4, which are trained on the powerful cloud with large datasets. Hence, training effective DL models has become a vital stage in the whole software lifecycle. When training deep learning models, especially those big models, developers need to parallelize and distribute the computation and memory resources amongst multiple devices (e.g., a cluster of GPUs) in the training process, which is known as distributed deep learning training , or distributed training for short. However, the unique challenges that developers encounter in distributed training process have not been studied in the software engineering community. Given the increasingly heavy dependence of current DL-based software on distributed training, this paper aims to fill in the knowledge gap and presents the first comprehensive study on developers’ issues in distributed training. To this end, we focus on popular DL frameworks that support distributed training (including TensorFlow, PyTorch, Keras, and Horovod) and analyze 1,131 real-world developers’ issues about using these frameworks reported on Stack Overflow and GitHub. We construct a fine-grained taxonomy consisting of 30 categories regarding the fault symptoms and summarize common fix patterns for different symptoms. We find that: (1) many distributed-specific faults and non-distributed-specific faults inherently share the same fault symptoms, making it challenging to debug; (2) most of the fault symptoms have frequent fix patterns; (3) about half of the faults are related to system-level configurations. Based on the results, we suggest actionable implications on research avenues that can potentially facilitate the distributed training to develop DL-based software, such as focusing on the frequent and common fix patterns when designing testing or debugging tools, developing efficient testing and debugging techniques for communication configuration along with the synthesis of network configuration analysis, designing new multi-device checkpoint-and-replay techniques to help reproduction, and designing serverless APIs for cloud platforms.

Read full abstract

GPU Cluster Research Articles

Related Topics

Articles published on GPU Cluster

Taming Flexible Job Packing in Deep Learning Training Clusters

SLoB: Suboptimal Load Balancing Scheduling in Local Heterogeneous GPU Clusters for Large Language Model Inference

Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters.

GPU cluster dynamics: insights from Alibaba’s 2023 trace release

Multisegment Overlap–Save Method for Coherent Dedispersion

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Design and implementation of a scalable correlator based on ROACH2 + GPU cluster for tianlai 96-dual-polarization antenna array

UWLPIPE: Ultra-wide Bandwidth Low-frequency Pulsar Data Processing Pipeline

PM2D: A parallel GPU-based code for the kinetic simulation of laser plasma instabilities at large scales

PerfTop: Towards performance prediction of distributed learning over general topology

Allok: a machine learning approach for efficient graph execution on CPU–GPU clusters

PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters

LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster

Large-Scale Molecular Dynamics Simulation Based on Heterogeneous Many-Core Architecture.

PSRDP: A Parallel Processing Method for Pulsar Baseband Data

Utilization-prediction-aware energy optimization approach for heterogeneous GPU clusters

MSHGN: Multi-scenario adaptive hierarchical spatial graph convolution network for GPU utilization prediction in heterogeneous GPU clusters

OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning

Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective

Interference-aware opportunistic job placement for shared distributed deep learning clusters

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

GPU Cluster Research Articles

Related Topics

Articles published on GPU Cluster

Taming Flexible Job Packing in Deep Learning Training Clusters

SLoB: Suboptimal Load Balancing Scheduling in Local Heterogeneous GPU Clusters for Large Language Model Inference

Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters.

GPU cluster dynamics: insights from Alibaba’s 2023 trace release

Multisegment Overlap–Save Method for Coherent Dedispersion

DISTRIBUTED HIGH-PERFORMANCE COMPUTING METHODS FOR ACCELERATING DEEP LEARNING TRAINING

Design and implementation of a scalable correlator based on ROACH2 + GPU cluster for tianlai 96-dual-polarization antenna array

UWLPIPE: Ultra-wide Bandwidth Low-frequency Pulsar Data Processing Pipeline

PM2D: A parallel GPU-based code for the kinetic simulation of laser plasma instabilities at large scales

PerfTop: Towards performance prediction of distributed learning over general topology

Allok: a machine learning approach for efficient graph execution on CPU–GPU clusters

PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters

LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU cluster

Large-Scale Molecular Dynamics Simulation Based on Heterogeneous Many-Core Architecture.

PSRDP: A Parallel Processing Method for Pulsar Baseband Data

Utilization-prediction-aware energy optimization approach for heterogeneous GPU clusters

MSHGN: Multi-scenario adaptive hierarchical spatial graph convolution network for GPU utilization prediction in heterogeneous GPU clusters

OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning

Rise of Distributed Deep Learning Training in the Big Model Era: From a Software Engineering Perspective

Interference-aware opportunistic job placement for shared distributed deep learning clusters