Abstract

For parallel applications, performance variance is a critical issue that can degrade performance and make applications’ behavior difficult to explain. Therefore, users and application developers should be able to detect and diagnose performance variance. Previous detection methods either introduce too much overhead and slow down applications, or rely on nontrivial source code analysis, which is impractical for production-run parallel systems. In this article, we propose <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Vapro</small> , a framework for detecting and diagnosing performance variance in production-run parallel systems. Our method is based on an observation that most parallel programs contain code snippets that are executed repeatedly with a fixed workload and can be utilized to detect performance variance. We present State Transition Graph (STG) to track program execution and then do light-weight workload analysis on STG to locate performance variance. <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Vapro</small> is able to successfully identify these snippets at runtime even without program source code. To diagnose the discovered variation, <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Vapro</small> uses a progressive diagnosis method based on a hybrid model combining variance breakdown and statistical analysis. According to evaluating results, <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Vapro</small> 's performance overhead is only 1.38% on average. <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Vapro</small> can identify performance variance in real applications caused by hardware issues, such as memory and IO. The standard deviation of the execution time is decreased by up to 73.5% when the identified variance is fixed. <sc xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Vapro</small> achieves 30.0% larger detection coverage than the state-of-the-art variance detection approach based on source code analysis.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call