Abstract

The task of performance analysis and optimization grows more and more challenging with the increasing scale and complexity of large computing systems. The need for a holistic system analysis becomes apparent when traditional approaches do not collect the information that is required to investigate performance penalties caused by shared system resources. We have developed a distributed approach that is able to collect and process performance data from shared system resources. We call our software implementation of this approach Dataheap and have integrated it with a traditional program tracing facility. In this paper we describe the needs that have driven this development as well as connections to related projects. Dataheap is based on a threaded server, distributed agents that collect performance data, a storage backend that makes use of different databases, and access libraries that allow external systems to retrieve current and historic performance data. The server subsequently processes incoming performance data and allows to create secondary metrics on the fly which helps to transform individual system characteristics to standard performance metrics. Finally, we briefly illustrate how this approach has enhanced our performance debugging capabilities as well as our research on energy effcient computing.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call