Compiler Directives Research Articles

A commonly used approach to develop deterministic parallel programs is to augment a sequential program with compiler directives that indicate which program blocks may potentially be executed in parallel. This paper develops a verification technique to reason about such compiler directives, in particular to show that they do not change the behaviour of the program. Moreover, the verification technique is tool-supported and can be combined with proving functional correctness of the program. To develop our verification technique, we propose a simple intermediate representation (syntax and semantics) that captures the main forms of deterministic parallel programs. This language distinguishes three kinds of basic blocks: parallel, vectorised and sequential blocks, which can be composed using three different composition operators: sequential, parallel and fusion composition. We show how a widely used subset of OpenMP can be encoded into this intermediate representation. Our verification technique builds on the notion of iteration contract to specify the behaviour of basic blocks; we show that if iteration contracts are manually specified for single blocks, then that is sufficient to automatically reason about data race freedom of the composed program. Moreover, we also show that it is sufficient to establish functional correctness on a linearised version of the original program to conclude functional correctness of the parallel program. Finally, we exemplify our approach on an example OpenMP program, and we discuss how tool support is provided.

This work showcases that simply by using a few standard OpenMP compiler directives and without any major source code modifications, simulations of both steady and unsteady incompressible fluid flows by the implicit stiffly stable time-stepping projection method could easily be ported to run in parallel on any shared-memory, multicore systems with reasonably good computation performance. The shared-memory, multicore architecture is the state-of-the-art hardware environment for modern microprocessors, including computation nodes of supercomputers, workstation, and personal desktop and laptop. This work has modified the original semi-implicit, stiffly stable time-stepping projection method to be fully implicit. Following the modified method, the approximation of incompressible Navier Stokes (INS) equations is broken into a sequence of steps. Each step consists of three stages, dealing with advection, pressure and viscosity of the INS equations respectively. Each stage of every individual step is governed by its own governing equation(s), physical discretization of which transforms the partial-differential equation(s) to a set of linear algebraic equations, which in general matrix form are Ax=b. The ILU(0) preconditioner and the generalized minimum residual (GMRES) algorithm are used for the preconditioning and solution of Ax=b, respectively. Using Kovasznay and Pearson vortex flows as reference problems for steady and unsteady incompressible flows, respectively, the profiling tests have demonstrated that for both flows, more than 99% of the whole simulation time is spent on the production of ILU(0) preconditioner and GMRES implementation, which are then singled out to be OpenMP parallelized. Benchmarked on a computation node of 16 CPU cores, it has been demonstrated that mean parallel efficiencies of 72% and 73% have been achieved for simulations of Kovasznay and Pearson vortex flows, respectively. When all 16 CPU cores are used, simulations of Kovaszanay and Pearson vortex flows have been accelerated by a factor of 8.0 and 8.17, respectively. Performance and efficiency of OpenMP parallel implementation depend on the hardware system as well. Running the Kovasznay flow simulation on a 12 CPU-core workstation, the mean parallel efficiency is 89%, in contrast to the mean parallel efficiency of 72% that has been achieved on the 16 CPU-core computation node.

Compiler Directives Research Articles

Related Topics

Articles published on Compiler Directives

Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code

AutoScaleDSE: A Scalable Design Space Exploration Engine for High-Level Synthesis

Toward Ultrahigh-Resolution E3SM Land Modeling on Exascale Computers

A 3D transient CFD model to predict heat and moisture transfer in on-farm stored grain silo through parallel computing using compiler directives: Impact of discretization methods on solution efficacy

A New Multi-Target Compiler Architecture for Edge-Devices and Cloud Management

Code Transformation Impact on Compiler-based Optimization: A Case Study in the CMSSW

High‐performance SIMD modular arithmetic for polynomial evaluation

A Layered Mapping of Ada 202X to OpenMP

Correct program parallelisations

Extending the OpenCHK Model with advanced checkpoint features

Debugging Parallel Programs in DVM-System

The Using of DVM-System for Developing of a Program for Calculations of the Problem of Radiation Magnetic Gas Dynamics and Research of Plasma Dynamics in the QSPA Channel

Investigation of Data Dependencies by Dynamic Analysis of Sapfor

Progress in Dvm-System

Numerical multi-loop integration on heterogeneous many-core processors

Automatic Port to OpenACC/OpenMP for Physical Parameterization in Climate and Weather Code Using the CLAW Compiler

OpenACC acceleration for the [formula omitted] algorithm in Nek5000

OpenMP parallel implementation of stiffly stable time-stepping projection/GMRES(ILU(0)) implicit simulation of incompressible fluid flows on shared-memory, multicore architecture

Решение прикладных задач с использованием DVM-системы

Parallelization of Jump Flood-Cut Method in Shared Memory Multicore Systems

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Compiler Directives Research Articles

Related Topics

Articles published on Compiler Directives

Evaluating Performance Portability with the CMS Heterogeneous Pixel Reconstruction code

AutoScaleDSE: A Scalable Design Space Exploration Engine for High-Level Synthesis

Toward Ultrahigh-Resolution E3SM Land Modeling on Exascale Computers

A 3D transient CFD model to predict heat and moisture transfer in on-farm stored grain silo through parallel computing using compiler directives: Impact of discretization methods on solution efficacy

A New Multi-Target Compiler Architecture for Edge-Devices and Cloud Management

Code Transformation Impact on Compiler-based Optimization: A Case Study in the CMSSW

High‐performance SIMD modular arithmetic for polynomial evaluation

A Layered Mapping of Ada 202X to OpenMP

Correct program parallelisations

Extending the OpenCHK Model with advanced checkpoint features

Debugging Parallel Programs in DVM-System

The Using of DVM-System for Developing of a Program for Calculations of the Problem of Radiation Magnetic Gas Dynamics and Research of Plasma Dynamics in the QSPA Channel

Investigation of Data Dependencies by Dynamic Analysis of Sapfor

Progress in Dvm-System

Numerical multi-loop integration on heterogeneous many-core processors

Automatic Port to OpenACC/OpenMP for Physical Parameterization in Climate and Weather Code Using the CLAW Compiler

OpenACC acceleration for the [formula omitted] algorithm in Nek5000

OpenMP parallel implementation of stiffly stable time-stepping projection/GMRES(ILU(0)) implicit simulation of incompressible fluid flows on shared-memory, multicore architecture

Решение прикладных задач с использованием DVM-системы

Parallelization of Jump Flood-Cut Method in Shared Memory Multicore Systems