Abstract

Predicting the execution time of parallel applications in High Performance Computing (HPC) clusters has served different objectives, including helping developers to find relevant areas of code that require fine tuning, designing better job schedulers to increase clusters' utilization, and detecting system bottlenecks. We present a statistical approach to predict parallel application execution times using empirical analyses of the application execution times for small input sizes and the time spent on various phases of execution. We model the execution time of each phase an application by selecting a suitable kernel from a collection of well known benchmark kernels. To predict the application execution time for a larger input, the matching kernels are used to estimate the execution times for the major phases of the application, and a regression approach is then used to estimate the overall execution time. Prior approaches required determination of application's characteristics by extracting instruction traces, instrumenting the application code for time stamps, static code analysis, or creation of accurate simulation models. In contrast, our approach requires a few short executions (each taking less than 50 seconds) of the application to collect runtime profile data that are used to match application phases to kernels using statistical analyses and produce accurate execution time predictions for parallel scientific applications. We evaluate our methodology using three well known parallel scientific applications: SMG2000, SNAP and HPCG. Our prediction errors range from 1% to 15%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call