Abstract

We have developed an environment, based upon robust, existing, open source software, for tuning applications written using MPI, OpenMP or both. The goal of this effort, which integrates the OpenUH compiler and several popular performance tools, is to increase user productivity by providing an automated, scalable performance measurement and optimization system. In this paper we describe our environment, show how these complementary tools can work together, and illustrate the synergies possible by exploiting their individual strengths and combined interactions. We also present a methodology for performance tuning that is enabled by this environment. One of the benefits of using compiler technology in this context is that it can direct the performance measurements to capture events at different levels of granularity and help assess their importance, which we have shown to significantly reduce the measurement overheads. The compiler can also help when attempting to understand the performance results: it can supply information on how a code was translated and whether optimizations were applied. Our methodology combines two performance views of the application to find bottlenecks. The first is a high level view that focuses on OpenMP/MPI performance problems such as synchronization cost and load imbalances; the second is a low level view that focuses on hardware counter analysis with derived metrics that assess the efficiency of the code. Our experiments have shown that our approach can significantly reduce overheads for both profiling and tracing to acceptable levels and limit the number of times the application needs to be run with selected hardware counters. In this paper, we demonstrate the workings of this methodology by illustrating its use with selected NAS Parallel Benchmarks and a cloud resolving code.

Highlights

  • The difficulty of developing high performance applications has increased greatly with the growth in size and architectural complexity of each new generation of supercomputers

  • A single address space is seen by all the processors/nodes and its global memory is based on a cache-coherent Non-Uniform Memory Access system implemented via the NUMAlink4

  • In this thesis we have presented a methodology for solving performance problems that exploits the capabilities of an integrated tuning environment created in a collaboration between open source compiler developers and performance tools providers

Read more

Summary

Introduction

The difficulty of developing high performance applications has increased greatly with the growth in size and architectural complexity of each new generation of supercomputers. Sampling entire applications can yield large amounts of low level information that can be overwhelming for the user Some processor architectures, such as the Itanium 2 processors [15] and the PowerPC [23], support sampling by providing specialized hardware such as performance monitoring units. PDT [12] is a toolkit that was designed in an attempt to overcome the lack of a portable compiler instrumentation API with support for C, C++, Fortran and OpenMP It gathers static program information via a parser and represents it in a portable format suitable for use in source code instrumentation. These capabilities play a significant role in the reduction of instrumentation points, in reducing the instrumentation overhead and the size of performance trace files, and in improving a user’s ability to determine the impact of program optimizations

Contents of the paper
Related work
The OpenUH compiler and Dragon analysis tool
PerfSuite
Tools interactions
Compile time instrumentation
Tuning methodology and selective instrumentation
Description of the methodology
Selective instrumentation analysis
Case studies
Application description
Evaluating selective instrumentation
Performance analysis for the BT MPI benchmark
Performance analysis of the cloud code
Findings
Conclusions and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call