Abstract

SummaryAs parallel applications running on large‐scale computing systems become increasingly memory constrained, the ability to attribute memory usage to the various components of the application is becoming increasingly important. We present the design and implementation of memnesia, a novel memory usage profiler for parallel and distributed message‐passing applications. Our approach captures both application– and message‐passing library–specific memory usage statistics from unmodified binaries dynamically linked to a message‐passing communication library. Using microbenchmarks and proxy applications, we evaluated our profiler across three Message Passing Interface (MPI) implementations and two hardware platforms. The results show that our approach and the corresponding implementation can accurately quantify memory resource usage as a function of time, scale, communication workload, and software or hardware system architecture, clearly distinguishing between application and MPI library memory usage at a per‐process level. With this new capability, we show that job size, communication workload, and hardware/software architecture influence peak runtime memory usage. In practice, this tool provides a potentially valuable source of information for application developers seeking to measure and optimize memory usage.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call