CPU GPU Research Articles

Super-resolution (SR) generative adversarial networks (GANs) are promising for turbulence closure in large-eddy simulation (LES) due to their ability to accurately reconstruct high-resolution data from low-resolution fields. Current model training and inference strategies are not sufficiently mature for large-scale, distributed calculations due to the computational demands and often unstable training of SR-GANs, which limits the exploration of improved model structures, training strategies, and loss-function definitions. Integrating SR-GANs into LES solvers for inference-coupled simulations is also necessary to assess their a posteriori accuracy, stability, and cost. We investigate parallelization strategies for SR-GAN training and inference-coupled LES, focusing on computational performance and reconstruction accuracy. We examine distributed data-parallel training strategies for hybrid CPU–GPU node architectures and the associated influence of low-/high-resolution subbox size, global batch size, and discriminator accuracy. Accurate predictions require training subboxes that are sufficiently large relative to the Kolmogorov length scale. Care should be placed on the coupled effect of training batch size, learning rate, number of training subboxes, and discriminator’s learning capabilities. We introduce a data-parallel SR-GAN training and inference library for heterogeneous architectures that enables exchange between the LES solver and SR-GAN inference at runtime. We investigate the predictive accuracy and computational performance of this arrangement with particular focus on the overlap (halo) size required for accurate SR reconstruction. Similarly, a posteriori parallel scaling for efficient inference-coupled LES is constrained by the SR subdomain size, GPU utilization, and reconstruction accuracy. Based on these findings, we establish guidelines and best practices to optimize resource utilization and parallel acceleration of SR-GAN turbulence model training and inference-coupled LES calculations while maintaining predictive accuracy.

Read full abstract

ABSTRACT Radiative transfer (RT) is a crucial ingredient for self-consistent modelling of numerous astrophysical phenomena across cosmic history. However, on-the-fly integration into radiation hydrodynamics (RHD) simulations is computationally demanding, particularly due to the stringent time-stepping conditions and increased dimensionality inherent in multifrequency collisionless Boltzmann physics. The emergence of exascale supercomputers, equipped with extensive CPU cores and GPU accelerators, offers new opportunities for enhancing RHD simulations. We present the first steps towards optimizing arepo-rt for such high-performance computing environments. We implement a novel node-to-node (n-to-n) communication strategy that utilizes shared memory to substitute intranode communication with direct memory access. Furthermore, combining multiple internode messages into a single message substantially enhances network bandwidth utilization and performance for large-scale simulations on modern supercomputers. The single-message n-to-n approach also improves performance on smaller scale machines with less optimized networks. Furthermore, by transitioning all RT-related calculations to GPUs, we achieve a significant computational speedup of around 15 for standard benchmarks compared to the original CPU implementation. As a case study, we perform cosmological RHD simulations of the Epoch of Reionization, employing a similar setup as the THESAN project. In this context, RT becomes sub-dominant such that even without modifying the core arepo codebase, there is an overall threefold improvement in efficiency. The advancements presented here have broad implications, potentially transforming the complexity and scalability of future simulations for a wide variety of astrophysical studies. Our work serves as a blueprint for porting similar simulation codes based on unstructured resolution elements to GPU-centric architectures.

Read full abstract

CPU GPU Research Articles

Related Topics

Articles published on CPU GPU

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Swarm–Intelligence-Based Task Scheduling for Reliability Optimization of Integrated CPU–GPU Edge Platforms in Cyber–Physical–Social Systems

Parallel implementation and performance of super-resolution generative adversarial network turbulence models for large-eddy simulation

A Survey on Heterogeneous CPU–GPU Architectures and Simulators

A Comprehensive of CPU and GPU Performance and Applications in Autonomous Vehicles

Performance Study of an MRI Motion-Compensated Reconstruction Program on Intel CPUs, AMD EPYC CPUs, and NVIDIA GPUs

MicroMagnetic.jl: A Julia package for micromagnetic and atomistic simulations with GPU support

Anatomizing Deep Learning Inference in Web Browsers

A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment

Adapting arepo-rt for exascale computing: GPU acceleration and efficient communication

Modeling of the fracture behaviors of concrete using 3D discrete element method with softening effect

Maboss for HPC environments: implementations of the continuous time Boolean model simulator for large CPU clusters and GPU accelerators

Allok: a machine learning approach for efficient graph execution on CPU–GPU clusters

CPU–GPU heterogeneous code acceleration of a finite volume Computational Fluid Dynamics solver

Physical mechanism-corrected degradation trend prediction network under data missing

CPU–GPU Heterogeneous Computation Offloading and Resource Allocation Scheme for Industrial Internet of Things

Direct reduction of iron-ore with hydrogen in fluidized beds: A coarse-grained CFD-DEM-IBM study

A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU–GPU architectures

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

End‐to‐End Compressed Meshlet Rendering

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

CPU GPU Research Articles

Related Topics

Articles published on CPU GPU

Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU

Swarm–Intelligence-Based Task Scheduling for Reliability Optimization of Integrated CPU–GPU Edge Platforms in Cyber–Physical–Social Systems

Parallel implementation and performance of super-resolution generative adversarial network turbulence models for large-eddy simulation

A Survey on Heterogeneous CPU–GPU Architectures and Simulators

A Comprehensive of CPU and GPU Performance and Applications in Autonomous Vehicles

Performance Study of an MRI Motion-Compensated Reconstruction Program on Intel CPUs, AMD EPYC CPUs, and NVIDIA GPUs

MicroMagnetic.jl: A Julia package for micromagnetic and atomistic simulations with GPU support

Anatomizing Deep Learning Inference in Web Browsers

A high-performance dynamic scheduling for sparse matrix-based applications on heterogeneous CPU–GPU environment

Adapting arepo-rt for exascale computing: GPU acceleration and efficient communication

Modeling of the fracture behaviors of concrete using 3D discrete element method with softening effect

Maboss for HPC environments: implementations of the continuous time Boolean model simulator for large CPU clusters and GPU accelerators

Allok: a machine learning approach for efficient graph execution on CPU–GPU clusters

CPU–GPU heterogeneous code acceleration of a finite volume Computational Fluid Dynamics solver

Physical mechanism-corrected degradation trend prediction network under data missing

CPU–GPU Heterogeneous Computation Offloading and Resource Allocation Scheme for Industrial Internet of Things

Direct reduction of iron-ore with hydrogen in fluidized beds: A coarse-grained CFD-DEM-IBM study

A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU–GPU architectures

Accelerating Sampling and Aggregation Operations in GNN Frameworks with GPU Initiated Direct Storage Accesses

End‐to‐End Compressed Meshlet Rendering