Hybrid OpenMP Research Articles

For parallelization of applications with high processing times and large amounts of storage in High Performance Computing (HPC) systems, shared memory programming and distributed memory programming have been used; a parallel application is represented by Parallel Task Graphs (PTGs) using Directed Acyclic Graphs (DAGs). For the execution of PTGs in HPC systems, a scheduler is executed in two phases: scheduling and allocation; the execution of the scheduler is considered an NP-complete combinatorial problem and requires large amounts of storage and long processing times. Array Method (AM) is a scheduler to execute the task schedule in a set of clusters; this method was programmed sequentially, analyzed and tested using real and synthetic application workloads in previous work. Analyzing the proposed designs of this method in this research work, the parallelization of the method is extended using hybrid OpenMP and MPI programming in a server farm and using a set of geographically distributed clusters; at the same time, a novel method for searching free resources in clusters using Lévy random walks is proposed. Synthetic and real workloads have been experimented with to evaluate the performance of the new parallel schedule and compare it to the sequential schedule. The metrics of makespan, waiting time, quality of assignments and search for free resources were evaluated; the results obtained and described in the experiments section show a better performance with the new version of the parallel algorithm compared to the sequential version. By using the parallel approach with hybrid programming applied to the extraction of characteristics of the PTGs, applied to the search for geographically distributed resources with Lévy random walks and applied to the metaheuristic used, the results of the metrics are improved. The makespan is decreased even when the loads increase, the times of the tasks in the waiting queue are decreased, the quality of assignments in the clusters is improved by causing the tasks with their subtasks to be assigned in the same clusters or in cluster neighbors and, finally, the searches for free resources are executed in different geographically distributed clusters, not sequentially.

• Parallelization of stencil-based 2D MPDATA algorithm across both GPUs and CPUs. • Minimization of communication between CPU and GPU at the cost of extra computations. • Adaptation of MPDATA to CPUs using space and temporal blocking techniques. • Adaptation of MPDATA to GPUs based on hierarchical decomposition . • Approach to optimization of MPDATA on GPUs using autotuning technique. EULAG (Eulerian/semi-Lagrangian fluid solver) is an established computational model developed for simulating thermo-fluid flows across a wide range of scales and physical scenarios. The dynamic core of EULAG includes the multidimensional positive definite advection transport algorithm (MPDATA) and elliptic solver. In this work we investigate aspects of an optimal parallel version of the 2D MPDATA algorithm on modern hybrid architectures with GPU accelerators, where computations are distributed across both GPU and CPU components. Using the hybrid OpenMP–OpenCL model of parallel programming opens the way to harness the power of CPU–GPU platforms in a portable way. In order to better utilize features of such computing platforms, comprehensive adaptations of MPDATA computations to hybrid architectures are proposed. These adaptations are based on efficient strategies for memory and computing resource management, which allow us to ease memory and communication bounds, and better exploit the theoretical floating point efficiency of CPU–GPU platforms. The main contributions of the paper are: • method for the decomposition of the 2D MPDATA algorithm as a tool to adapt MPDATA computations to hybrid architectures with GPU accelerators by minimizing communication and synchronization between CPU and GPU components at the cost of additional computations; • method for the adaptation of 2D MPDATA computations to multicore CPU platforms, based on space and temporal blocking techniques; • method for the adaptation of the 2D MPDATA algorithm to GPU architectures, based on a hierarchical decomposition strategy across data and computation domains, with support provided by the developed GPU task scheduler allowing for the flexible management of available resources; • approach to the parametric optimization of 2D MPDATA computations on GPUs using the autotuning technique, which allows us to provide a portable implementation methodology across a variety of GPUs. Hybrid platforms tested in this study contain different numbers of CPUs and GPUs – from solutions consisting of a single CPU and a single GPU to the most elaborate configuration containing two CPUs and two GPUs. Processors of different vendors are employed in these systems – both Intel and AMD CPUs, as well as GPUs from NVIDIA and AMD. For all the grid sizes and for all the tested platforms, the hybrid version with computations spread across CPU and GPU components allows us to achieve the highest performance. In particular, for the largest MPDATA grids used in our experiments, the speedups of the hybrid versions over GPU and CPU versions vary from 1.30 to 1.69, and from 1.95 to 2.25, respectively.

Hybrid OpenMP Research Articles

Related Topics

Articles published on Hybrid OpenMP

Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications

Parallelization of Array Method with Hybrid Programming: OpenMP and MPI

Multiscale modeling of coupled thermo-mechanical behavior of granular media in large deformation and flow

Fast and Accurate Solution of Integral Formulations of Large MQS Problems Based on Hybrid OpenMP–MPI Parallelization

Runtime Support for Openmp Hybrid Cpu-Gpu Applications

Computational Efficiency Examination of a Regional Numerical Weather Prediction Model using KISTI Supercomputer NURION

High-Performance Computing Implementations of Agent-Based Economic Models for Realizing 1:1 Scale Simulations of Large Economies

Lattice Boltzmann simulations of magnetic particles in a three-dimensional microchannel

Spatiotemporal parallelization of an analytical heat conduction model for additive manufacturing via a hybrid OpenMP + MPI approach

CFD-DEM simulation of fluidization of multisphere- modelled cylindrical particles

Exploiting OpenMP and OpenACC to accelerate a geometric approach to molecular docking in heterogeneous HPC nodes

A parallel unidirectional coupled DEM-PBM model for the efficient simulation of computationally intensive particulate process systems

Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi

A Fixed-Mesh Approach for Gas-Liquid-Rigid Interaction Problems

Performance optimizations for scalable implicit RANS calculations with SU2

Megadock 4.0. An Ultra-High-Performance Protein-Protein Docking Software for Heterogeneous Supercomputers

Real-time and real-space program tuned in K-computer

Developing a scalable hybrid MPI/OpenMP unstructured finite element model

Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators

Photon-plasma: A modern high-order particle-in-cell code

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Hybrid OpenMP Research Articles

Related Topics

Articles published on Hybrid OpenMP

Heterogeneous programming using OpenMP and CUDA/HIP for hybrid CPU-GPU scientific applications

Parallelization of Array Method with Hybrid Programming: OpenMP and MPI

Multiscale modeling of coupled thermo-mechanical behavior of granular media in large deformation and flow

Fast and Accurate Solution of Integral Formulations of Large MQS Problems Based on Hybrid OpenMP–MPI Parallelization

Runtime Support for Openmp Hybrid Cpu-Gpu Applications

Computational Efficiency Examination of a Regional Numerical Weather Prediction Model using KISTI Supercomputer NURION

High-Performance Computing Implementations of Agent-Based Economic Models for Realizing 1:1 Scale Simulations of Large Economies

Lattice Boltzmann simulations of magnetic particles in a three-dimensional microchannel

Spatiotemporal parallelization of an analytical heat conduction model for additive manufacturing via a hybrid OpenMP + MPI approach

CFD-DEM simulation of fluidization of multisphere- modelled cylindrical particles

Exploiting OpenMP and OpenACC to accelerate a geometric approach to molecular docking in heterogeneous HPC nodes

A parallel unidirectional coupled DEM-PBM model for the efficient simulation of computationally intensive particulate process systems

Improvement of workload balancing using parallel loop self-scheduling on Intel Xeon Phi

A Fixed-Mesh Approach for Gas-Liquid-Rigid Interaction Problems

Performance optimizations for scalable implicit RANS calculations with SU2

Megadock 4.0. An Ultra-High-Performance Protein-Protein Docking Software for Heterogeneous Supercomputers

Real-time and real-space program tuned in K-computer

Developing a scalable hybrid MPI/OpenMP unstructured finite element model

Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators

Photon-plasma: A modern high-order particle-in-cell code