Abstract

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has been shown to increase aggregate throughput and energy efficiency by 10–20%. However, system operators disallow co-location due to fair-pricing concerns, i.e., a pricing mechanism that considers performance interference from co-running jobs. In the current pricing model, application execution time determines the price, which results in unfair prices paid by the minority of users whose jobs suffer from co-location. This paper presents POPPA, a runtime system that enables fair pricing by delivering precise online interference detection and facilitates the adoption of supercomputers with co-locations. POPPA leverages a novel shutter mechanism – a cyclic, fine-grained interference sampling mechanism to accurately deduce the interference between co-runners – to provide unbiased pricing of jobs that share nodes. POPPA is able to quantify inter-application interference within 4% mean absolute error on a variety of co-located benchmark and real scientific workloads.

Highlights

  • Supercomputers typically have hundreds to thousands of users and consist of tens to thousands of individual servers connected over a high-speed optical interconnect

  • We present a new pricing model for HPC clusters based on Persistent Online Precise Pricing Agent (POPPA) to provide fair pricing to users

  • The main loop consists of the three core operations of Algorithm 3 – measuring the instructions per cycle (IPC) of the application just prior to the shutter, issuing the shutter and measuring the IPC of the application during that window, and measuring the IPC of the application immediately following the shutter

Read more

Summary

Introduction

Supercomputers typically have hundreds to thousands of users and consist of tens to thousands of individual servers connected over a high-speed optical interconnect. We start by examining the accounting and allocation model found in the United States Department of Energy Office of Science INCITE program [28] and the National Science Foundation XSEDE program [20], two of the largest U.S programs that provide resources to the general HPC research community. Each of these programs facilitates access to a number of large scale computing infrastructures. Regardless of what mechanisms are implemented to improve supercomputer performance, energy efficiency or fault tolerance, they must not pervert the fairness of the pricing scheme

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call