Abstract

Inferring the expected performance for parallel applications is getting harder than ever; applications need to be modeled for restricted or nonexistent systems and performance analysts are required to identify and extrapolate their behavior using only the available resources. Prediction models can be based on detailed knowledge of the application algorithms or on blindly trying to extrapolate measurements from existing architectures and codes. This paper describes the work done to define an intermediate methodology where the combination of a the essential knowledge about fundamental factors in parallel codes, and b detailed analysis of the application behavior at low core counts on current platforms, guides the modeling efforts to estimate behavior at very large core counts. Our methodology integrates the use of several components like instrumentation package, visualization tools, simulators, analytical models and very high level information from the application running on systems in production to build a performance model.

Highlights

  • Within the race toward exascale computing, to infer the scaling capacity of current parallel codes has become essential [1]

  • We described a methodology to collect primary components of current parallel codes and infer their expected behavior when scaled to larger core counts

  • To extrapolate the expected parallel efficiency, the approach extracted basic knowledge from traces obtained from runs using a low number of processes

Read more

Summary

Introduction

Within the race toward exascale computing, to infer the scaling capacity of current parallel codes has become essential [1]. This proposal starts by capturing detailed data from traces of very few runs of the parallel code at low core counts in machines that are in production, i.e. without additional tunings for exclusivity. Significant performance components such as load balance and transfer can be measured, fitted and extrapolated at large core counts. Labarta few points to fit, the blind use of functions with many parameters would lead to undetermined systems with many possible solutions in the explored core count range These solutions may have huge differences in performance when extrapolating for large core counts. It is our belief that low core count runs provide enough information on the fundamental behavior of parallel code, and can complement existent time profiles reports of the different routines.

Identify Structure
Phase Performance Analysis
Scalability Prediction
Experimental Evaluation
Model Fitting
Validation of Results
Projection for large core counts
Additional enhancements
Impact of noise in communication
Impact of noise in computation
Related Work
Findings
Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.