Enabling Fair Pricing on High Performance Computer Systems with Node Sharing

Alex D Breslow,Lingjia Tang,Jason Mars,Ananta Tiwari,Martin Schulz,Laura Carrington

doi:10.1155/2014/906454

Abstract

Co-location, where multiple jobs share compute nodes in large-scale HPC systems, has been shown to increase aggregate throughput and energy efficiency by 10–20%. However, system operators disallow co-location due to fair-pricing concerns, i.e., a pricing mechanism that considers performance interference from co-running jobs. In the current pricing model, application execution time determines the price, which results in unfair prices paid by the minority of users whose jobs suffer from co-location. This paper presents POPPA, a runtime system that enables fair pricing by delivering precise online interference detection and facilitates the adoption of supercomputers with co-locations. POPPA leverages a novel shutter mechanism – a cyclic, fine-grained interference sampling mechanism to accurately deduce the interference between co-runners – to provide unbiased pricing of jobs that share nodes. POPPA is able to quantify inter-application interference within 4% mean absolute error on a variety of co-located benchmark and real scientific workloads.

Highlights

Supercomputers typically have hundreds to thousands of users and consist of tens to thousands of individual servers connected over a high-speed optical interconnect
We present a new pricing model for HPC clusters based on Persistent Online Precise Pricing Agent (POPPA) to provide fair pricing to users
The main loop consists of the three core operations of Algorithm 3 – measuring the instructions per cycle (IPC) of the application just prior to the shutter, issuing the shutter and measuring the IPC of the application during that window, and measuring the IPC of the application immediately following the shutter

Summary

Introduction

Supercomputers typically have hundreds to thousands of users and consist of tens to thousands of individual servers connected over a high-speed optical interconnect. We start by examining the accounting and allocation model found in the United States Department of Energy Office of Science INCITE program [28] and the National Science Foundation XSEDE program [20], two of the largest U.S programs that provide resources to the general HPC research community. Each of these programs facilitates access to a number of large scale computing infrastructures. Regardless of what mechanisms are implemented to improve supercomputer performance, energy efficiency or fault tolerance, they must not pervert the fairness of the pricing scheme

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific programming	Publication Date: Jan 1, 2014
Citations: 5	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Enabling Fair Pricing on High Performance Computer Systems with Node Sharing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific programming

Lead the way for us

Similar Papers

Enabling fair pricing on HPC systems with node sharing
Alex D Breslow ... Martin Schulz
-
Alex D Breslow, et. al.Alex D Breslow ... Martin Schulz
17 Nov 2013
17 Nov 2013

DWPE, a new data center energy-efficiency metric bridging the gap between infrastructure and workload
Torsten Wilde ... Axel Auweter
-
Torsten Wilde, et. al.Torsten Wilde ... Axel Auweter
01 Jul 2014
01 Jul 2014

Energy Efficiency Models for Scientific Applications on Supercomputers
Mark Endrei
-
Mark EndreiMark Endrei
17 Jan 2020
17 Jan 2020

Trends in Data Locality Abstractions for HPC Systems
Didem Unat ...
IEEE Transactions on Parallel and Distributed Systems | VOL. 28
Didem Unat, et. al.Didem Unat ...
01 Oct 2017
IEEE Transactions on Parallel and Distributed Systems | VOL. 28

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enabling Fair Pricing on High Performance Computer Systems with Node Sharing

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific programming