HPAS

Emre Ates,Vitus J Leung,Manuel Egele,Burak Aksar,Yijia Zhang,Jim Brandt,Ayse K Coskun

doi:10.1145/3337821.3337907

Abstract

Modern high performance computing (HPC) systems, including supercomputers, routinely suffer from substantial performance variations. The same application with the same input can have more than 100% performance variation, and such variations cause reduced efficiency and wasted resources. There have been recent studies on performance variability and on designing automated methods for diagnosing that cause performance variability. These studies either observe data collected from HPC systems, or they rely on synthetic reproduction of performance variability scenarios. However, there is no standardized way of creating performance variability inducing synthetic anomalies; so, researchers rely on designing ad-hoc methods for reproducing performance variability. This paper addresses this lack of a common method for creating relevant performance anomalies by introducing HPAS, an HPC Performance Anomaly Suite, consisting of anomaly generators for the major subsystems in HPC systems. These easy-to-use synthetic anomaly generators facilitate low-effort evaluation and comparison of various analytics methods as well as performance or resilience of applications, middleware, or systems under realistic performance variability scenarios. The paper also provides an analysis of the behavior of the anomaly generators and demonstrates several use cases: (1) performance anomaly diagnosis using HPAS, (2) evaluation of resource management policies under performance variations, and (3) design of applications that are resilient to performance variability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HPAS

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Proctor: A Semi-Supervised Performance Anomaly Diagnosis Framework for Production HPC Systems
Burak Aksar ... Manuel Egele
-
Burak Aksar, et. al.Burak Aksar ... Manuel Egele
01 Jan 2020
01 Jan 2020

Supervised Performance Anomaly Detection in HPC Data Centers
Mohamed Soliman Halawa ... Ana Fernández Vilas
-
Mohamed Soliman Halawa, et. al.Mohamed Soliman Halawa ... Ana Fernández Vilas
17 Mar 2019
17 Mar 2019

Middleware in Modern High Performance Computing System Architectures
Christian Engelmann ... Stephen L Scott
-
Christian Engelmann, et. al.Christian Engelmann ... Stephen L Scott
01 Jan 2007
01 Jan 2007

Machine Learning Predictions for Underestimation of Job Runtime on HPC System
Jian Guo ... Satoshi Matsuoka
-
Jian Guo, et. al.Jian Guo ... Satoshi Matsuoka
01 Jan 2018
01 Jan 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HPAS

Abstract

Talk to us

Similar Papers