Online Tuning of EASY-Backfilling using Queue Reordering Policies

Eric Gaussier,Denis Trystram,Valentin Reis,Jerome Lelong

doi:10.1109/tpds.2018.2820699

Abstract

The EASY-FCFS heuristic is the basic building block of job scheduling policies in most parallel High Performance Computing platforms. Despite its simplicity, and the guarantee of no job starvation, it could still be improved on a per-system basis. Such tuning is difficult because of non-linearities in the scheduling process. The study conducted in this paper considers an online approach to the automatic tuning of the EASY heuristic for HPC platforms. More precisely, we consider the problem of selecting a reordering policy for the job queue under several feedback modes. We show via a comprehensive experimental validation on actual logs that periodic simulation of historical data can be used to recover existing in-hindsight results that allow to divide the average waiting time by almost 2. This results holds even when the simulator results are noisy. Moreover, we show that good performances can still be obtained without a simulator, under what is called bandit feedback - when we can only observe the performance of the algorithm that was picked on the live system. Indeed, a simple multi-armed bandit algorithm can reduce the average waiting time by 40 percent.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Online Tuning of EASY-Backfilling using Queue Reordering Policies

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Oct 1, 2018
Citations: 49

Similar Papers

Revisiting Convolution and FFT on Parallel Computation Platforms
Haohuan Fu ... Olav Lindtjorn
-
Haohuan Fu, et. al.Haohuan Fu ... Olav Lindtjorn
01 Jan 2009
01 Jan 2009

Parallel Processing Scheme for the Navier-Stokes Equations, Part 2: Parallel Implementation
N Ghizawi ... S Abdallah
AIAA Journal | VOL. 36
N Ghizawi, et. al.N Ghizawi ... S Abdallah
01 Nov 1998
AIAA Journal | VOL. 36

EUROPORT — ESPRIT European porting projects
Adrian Colbrook ... Klaus Stüben
-
Adrian Colbrook, et. al.Adrian Colbrook ... Klaus Stüben
18 Apr 1994
18 Apr 1994

Comprehensive distributed-parameters modeling and experimental validation of microcantilever-based biosensors with an application to ultrasmall biological species detection
Samira Faegh ... Nader Jalili
Journal of Micromechanics and Microengineering | VOL. 23
Samira Faegh, et. al.Samira Faegh ... Nader Jalili
21 Dec 2012
Journal of Micromechanics and Microengineering | VOL. 23

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online Tuning of EASY-Backfilling using Queue Reordering Policies

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems