Hybrid CPU-GPU Research Articles

An algorithm is presented for the coupled-cluster singles, doubles, and perturbative triples correction [CCSD(T)] method based on the density fitting or the resolution-of-the-identity (RI) approximation for performing calculations on heterogeneous computing platforms composed of multicore CPUs and graphics processing units (GPUs). The directive-based approach to GPU offloading offered by the OpenMP application programming interface has been employed to adapt the most compute-intensive terms in the RI-CCSD amplitude equations with computational costs scaling as , , and (where NO and NV denote the numbers of correlated occupied and virtual orbitals, respectively) and the perturbative triples correction to execute on GPU architectures. The pertinent tensor contractions are performed using an accelerated math library such as cuBLAS or hipBLAS. Optimal strategies are discussed for splitting large data arrays into tiles to fit them into the relatively small memory space of the GPUs, while also minimizing the low-bandwidth CPU-GPU data transfers. The performance of the hybrid CPU-GPU RI-CCSD(T) code is demonstrated on pre-exascale supercomputers composed of heterogeneous nodes equipped with NVIDIA Tesla V100 and A100 GPUs and on the world's first exascale supercomputer named "Frontier", the nodes of which consist of AMD MI250X GPUs. Speedups within the range 4-8× relative to the recently reported CPU-only algorithm are obtained for the GPU-offloaded terms in the RI-CCSD amplitude equations. Applications to polycyclic aromatic hydrocarbons containing 16-66 carbon atoms demonstrate that the acceleration of the hybrid CPU-GPU code for the perturbative triples correction relative to the CPU-only code increases with the molecule size, attaining a speedup of 5.7× for the largest circumovalene molecule (C66H20). The GPU-offloaded code enables the computation of the perturbative triples correction for the C60 molecule using the cc-pVDZ/aug-cc-pVTZ-RI basis sets in 7 min on Frontier when using 12,288 AMD GPUs with a parallel efficiency of 83.1%.

Read full abstract

Thanks to the computational power of modern cluster machines, numerical simulations can provide, with an unprecedented level of details, new insights into fluid mechanics. However, taking full advantage of this hardware remains challenging since data communication remains a significant bottleneck to reaching peak performances. Reducing floating point precision is a simple and effective way to reduce data movement and improve the computational speed of most applications. Nevertheless, special care needs to be taken to ensure the quality and convergence of computed solutions, especially when dealing with complex fluid simulations. In this work, we analyse the impact of reduced (single and mixed compared to double) precision on computational performance and accuracy for computational fluid dynamics. Using the open source library OpenFOAM, we consider incompressible, compressible, and multiphase fluid solvers for testing on relevant benchmarks for flows in the laminar and turbulent regime and in the presence of shock waves. Computational gain and changes in the scalability of applications in reduced precision are also discussed. In particular, an ad hoc theoretical model for the strong scaling allows us to interpret and understand the observed behaviours, as a function of floating point precision and hardware specifics. Finally, we show how reduced precision can significantly speed up a hybrid CPU–GPU implementation, made available to OpenFOAM end-users recently, that simply relies on a GPU linear algebra solver developed by hardware vendors.

Read full abstract

Hybrid CPU-GPU Research Articles

Articles published on Hybrid CPU-GPU

Mixed-precision pre-pivoting strategy for the LU factorization

High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments

Two-dimensional quantum lattice models via mode optimized hybrid CPU-GPU density matrix renormalization group method

Direct reduction of iron-ore with hydrogen in fluidized beds: A coarse-grained CFD-DEM-IBM study

A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU–GPU architectures

Accelerating Pythonic Coupled-Cluster Implementations: A Comparison Between CPUs and GPUs.

Hybrid CPU-GPU solution to regularized divergence-free curl-curl equations for electromagnetic inversion problems

Optimizing MRI Data Processing by exploiting GPU Acceleration for Efficient Image Analysis and Reconstruction

Hybridhadoop: CPU-GPU hybrid scheduling in hadoop

Enhancing Adaptive Physics Refinement Simulations Through the Addition of Realistic Red Blood Cell Counts.

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

On floating point precision in computational fluid dynamics using OpenFOAM

Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications

Optimizing Voronoi-based quantifications for reaching interactive analysis of 3D localizations in the million range.

A Review of High-Performance Computing Methods for Power Flow Analysis

A SCALABLE HYBRID CPU-GPU COMPUTATIONAL FRAMEWORK FOR A FINITE ELEMENT-BASED AIR QUALITY MODEL

A hybrid CPU–GPU solver based on three-dimensional spectral Chebyshev technique for determining the dynamic behavior of thick sandwich panels

Fine-grained heterogeneous parallel direct solver for finite element problems

Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous Clusters

High Performance Adaptive Physics Refinement to Enable Large-Scale Tracking of Cancer Cell Trajectory.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hybrid CPU-GPU Research Articles

Articles published on Hybrid CPU-GPU

Mixed-precision pre-pivoting strategy for the LU factorization

High-Performance Flow Classification of Big Data Using Hybrid CPU-GPU Clusters of Cloud Environments

Two-dimensional quantum lattice models via mode optimized hybrid CPU-GPU density matrix renormalization group method

Direct reduction of iron-ore with hydrogen in fluidized beds: A coarse-grained CFD-DEM-IBM study

A comparison of Algebraic Multigrid Bidomain solvers on hybrid CPU–GPU architectures

Accelerating Pythonic Coupled-Cluster Implementations: A Comparison Between CPUs and GPUs.

Hybrid CPU-GPU solution to regularized divergence-free curl-curl equations for electromagnetic inversion problems

Optimizing MRI Data Processing by exploiting GPU Acceleration for Efficient Image Analysis and Reconstruction

Hybridhadoop: CPU-GPU hybrid scheduling in hadoop

Enhancing Adaptive Physics Refinement Simulations Through the Addition of Realistic Red Blood Cell Counts.

Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.

On floating point precision in computational fluid dynamics using OpenFOAM

Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications

Optimizing Voronoi-based quantifications for reaching interactive analysis of 3D localizations in the million range.

A Review of High-Performance Computing Methods for Power Flow Analysis

A SCALABLE HYBRID CPU-GPU COMPUTATIONAL FRAMEWORK FOR A FINITE ELEMENT-BASED AIR QUALITY MODEL

A hybrid CPU–GPU solver based on three-dimensional spectral Chebyshev technique for determining the dynamic behavior of thick sandwich panels

Fine-grained heterogeneous parallel direct solver for finite element problems

Energy-Aware Non-Preemptive Task Scheduling With Deadline Constraint in DVFS-Enabled Heterogeneous Clusters

High Performance Adaptive Physics Refinement to Enable Large-Scale Tracking of Cancer Cell Trajectory.