Modeling and Analyzing Waiting Policies for Cloud-Enabled Schedulers

Pradeep Ambati,Prashant Shenoy,Noman Bashir,David Irwin

doi:10.1109/tpds.2021.3086270

Pradeep Ambati, Prashant Shenoy + Show 2 more

Open Access

https://doi.org/10.1109/tpds.2021.3086270

Copy DOI

Abstract

Cloud platforms have popularized the Infrastructure-as-a-Service (IaaS) purchasing model, which enables users to rent computing resources on demand to execute their jobs. However, buying fixed resources is still much cheaper than renting if their resource utilization is high. Thus, to optimize cost, users must decide how many fixed resources to provision versus rent “on demand” based on their workload. In this article, we introduce the concept of a waiting policy for cloud-enabled schedulers and show that the optimal cost depends on it. The waiting policy explicitly controls how long jobs wait for resources, as jobs never need to wait, since cloud platforms provide the illusion of infinite scalability. A waiting policy is the dual of a scheduling policy: while a scheduling policy determines which jobs should run when fixed resources are available, a waiting policy determines which jobs should wait when fixed resources are not available. We define multiple waiting policies and develop simple and general analytical models to reveal their tradeoff between fixed resource provisioning, cost, and job waiting time. We evaluate the impact of different waiting policies on a real year-long batch workload consisting of 14M jobs run on a 14.3k-core cluster. We show that a compound waiting policy, which forces jobs with long running times or short waiting times to wait for fixed resources, offers the best tradeoff. The policy decreases both the cost (by 5 percent) and mean job waiting time (by 7×) compared to the current cluster, and also decreases the cost (by 43 percent) compared to renting on-demand resources for a modest increase in mean job waiting time (at 1.74 hours).

Full Text