Shared Memory Programming Model Research Articles

A wealth of important scientific and engineering applications are configured for use on high performance computing architectures using functionality found in the MPI specification. This specification provides application developers with a straightforward means for implementing their ideas for execution on distributed-memory parallel processing computers. OpenMP directives provide a means for operating on shared-memory regions of those computers. With the advent of machines composed of many-core processors, the strict synchronisation required by the bulk synchronous parallel (BSP) communication model can hinder performance increases. This is due to the complexity to handle load imbalances, to reduce serialisation imposed by blocking communication patterns, to overlap communication with computation and, finally, to deal with increasing memory overheads. The MPI specification provides advanced features such as non-blocking calls or shared memory to mitigate some of these factors. However, applying these features efficiently usually requires significant changes on the application structure.Task parallel programming models are being developed as a means of mitigating the abovementioned issues but without requiring extensive changes on the application code. In this work, we present a methodology to develop hybrid applications based on tasks called hierarchical domain over-decomposition with tasking (HDOT). This methodology overcomes most of the issues found on MPI-only and traditional hybrid MPI+OpenMP applications. However, by emphasising the reuse of data partition schemes from process-level and applying them to task-level, it enables a natural coexistence between MPI and shared-memory programming models. The proposed methodology shows promising results in terms of programmability and performance measured on a set of applications.

This study aims at making a comparative study of various parallel programming models for a compute intensive application pertaining to Atmospheric modeling. Atmospheric modeling deals with predicting the behavior of atmosphere through mathematical equations governing the atmospheric fluid flows. The mathematical equations are nonlinear partial differential equations which are difficult to solve analytically. Thus fundamental governing equations of atmospheric motion are discretized into algebraic forms that are solved using numerical methods to obtain flow-field values at discrete points in time and/or space. Solving these equations often requires huge computational resource, which is normally available with high-speed supercomputers. Shallow Water equations provide a useful framework for the analysis of dynamics of large-scale atmospheric flow and for the analysis of various numerical methods that might be applied to the solution of these equations. In this study, Finite volume approach has been used for discretizing these equations that leads to a number of algebraic equations equal to the number of time instants at which the flow field values are to be evaluated. It is apparent that the application is embarrassingly parallel and its parallelization will suppress communication overhead. A High Performance Compute cluster has been employed for solving the equations involved in atmospheric modeling. Use of OpenMP and MPI APIs has paved the way to study the behavior of shared memory programming model and the message passing programming model in the context of such a highly compute intensive application. It is observed that no additional benefit can be enjoyed by creating too many software threads than the available hardware threads, as the execution resources should be shared among them.

Shared Memory Programming Model Research Articles

Related Topics

Articles published on Shared Memory Programming Model

Evaluating performance portability of five shared-memory programming models using a high-order unstructured CFD solver

연산 속도 향상을 위한 병렬 연산 기반 상태 추정

Collectives in hybrid MPI+MPI code: Design, practice and performance

The Method of Improving the Performance of Network Analysis Application for the Whole Power Grid

AllScale API

HDOT — An approach towards productive programming of hybrid applications

Analysis of World Experience in Creating Parallel Computing Systems Designed to Effectively Solve DIS-tasks

Parallelization of Pairwise Alignment and Neighbor-Joining Algorithm in Progressive Multiple Sequence Alignment

Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

Software Speculation on Caching DSMs

Hardware support for message-passing in chip multi-processors

Hardware support for message-passing in chip multi-processors

Efficient Embedded Software Migration towards Clusterized Distributed-Memory Architectures

A two-level parallelization method for distributed hydrological models

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

Different Approaches to Parallelization of Vector Assembly

Performance Analysis of an Embarrassingly Parallel Application in Atmospheric Modeling

DaSH: A benchmark suite for hybrid dataflow and shared memory programming models

DeNovoSync

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Shared Memory Programming Model Research Articles

Related Topics

Articles published on Shared Memory Programming Model

Evaluating performance portability of five shared-memory programming models using a high-order unstructured CFD solver

연산 속도 향상을 위한 병렬 연산 기반 상태 추정

Collectives in hybrid MPI+MPI code: Design, practice and performance

The Method of Improving the Performance of Network Analysis Application for the Whole Power Grid

AllScale API

HDOT — An approach towards productive programming of hybrid applications

Analysis of World Experience in Creating Parallel Computing Systems Designed to Effectively Solve DIS-tasks

Parallelization of Pairwise Alignment and Neighbor-Joining Algorithm in Progressive Multiple Sequence Alignment

Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

Automated Synthesis of Comprehensive Memory Model Litmus Test Suites

Software Speculation on Caching DSMs

Hardware support for message-passing in chip multi-processors

Hardware support for message-passing in chip multi-processors

Efficient Embedded Software Migration towards Clusterized Distributed-Memory Architectures

A two-level parallelization method for distributed hydrological models

Parallel and Scalable Short-Read Alignment on Multi-Core Clusters Using UPC++

Different Approaches to Parallelization of Vector Assembly

Performance Analysis of an Embarrassingly Parallel Application in Atmospheric Modeling

DaSH: A benchmark suite for hybrid dataflow and shared memory programming models

DeNovoSync