Abstract

Scientific applications often contain large, computationally-intensive, and irregular parallel loops or tasks that exhibit stochastic behavior leading to load imbalance. Load imbalance often manifests during the execution of parallel scientific applications on large and complex high performance computing (HPC) systems. The extreme scale of HPC systems on the road to Exascale computing only exacerbates the loss in performance due to load imbalance. Dynamic loop self-scheduling (DLS) techniques are instrumental in improving the performance of scientific applications on HPC systems via load balancing. Selecting a DLS technique that results in the best performance for different problem and system sizes requires a large number of exploratory experiments. Currently, a theoretical model that can be used to predict the scheduling technique that yields the best performance for a given problem and system has not yet been identified. Therefore, simulation is the most appropriate approach for conducting such exploratory experiments in a reasonable amount of time. However, conducting realistic and trustworthy simulations of application performance under different configurations is challenging. This work devises an approach to realistically simulate computationally-intensive scientific applications that employ DLS and execute on HPC systems. The proposed approach minimizes the sources of uncertainty in the simulative experiments results by bridging the native and simulative experimental approaches. A new method is proposed to capture the variation of application performance between different native executions. Several approaches to represent the application tasks (or loop iterations) are compared to establish their influence on the simulative application performance. A novel simulation strategy is introduced that applies the proposed approach, which transforms a native application code into simulative code. The native and simulative performance of two computationally-intensive scientific applications that employ eight task scheduling techniques (static, nonadaptive dynamic, and adaptive dynamic) are compared to evaluate the realism of the proposed simulation approach. The comparison of the performance characteristics extracted from the native and simulative performance shows that the proposed simulation approach fully captured most of the performance characteristics of interest. This work shows and establishes the importance of simulations that realistically predict the performance of DLS techniques for different applications and system configurations.

Highlights

  • Scientific applications are complex, large, and contain irregular parallel loops that often exhibit stochastic behavior

  • The first dimension covers the relevant information concerning dynamic load balancing via dynamic loop self-scheduling techniques, the selected Dynamic loop self-scheduling (DLS) techniques of the present work

  • We show that it is possible to realistically simulate the performance of scientific applications on high-performance computing (HPC) systems

Read more

Summary

Introduction

Scientific applications are complex, large, and contain irregular parallel loops (or tasks) that often exhibit stochastic behavior. The use of efficient loop scheduling techniques, from fully static to fully dynamic, in computationallyintensive applications is crucial for improving their performance on high performance computing (HPC) systems often degraded by load imbalance. Dynamic loop self-scheduling (DLS) is an effective scheduling approach employed to improve computationally-intensive scientific applications performance via dynamic load balancing. The first dimension covers the relevant information concerning dynamic load balancing via dynamic loop self-scheduling techniques, the selected DLS techniques of the present work. The essential difference between static and dynamic loop scheduling is the time when the scheduling decisions are taken Static scheduling techniques, such as block, cyclic, and block-cyclic [8], divide and assign the loop iterations (or tasks) across the processing elements (PEs) before the application executes. Block scheduling is considered and is denoted as STATIC

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call