A Theory of Auto-Scaling for Resource Reservation in Cloud Services

Konstantinos Psychas,Javad Ghaderi

doi:10.1287/stsy.2021.0091

Abstract

We consider a distributed server system consisting of a large number of servers, each with limited capacity on multiple resources (CPU, memory, etc.). Jobs with different rewards arrive over time and require certain amounts of resources for the duration of their service. When a job arrives, the system must decide whether to admit it or reject it, and if admitted, in which server to schedule it. The objective is to maximize the expected total reward received by the system. This problem is motivated by control of cloud computing clusters, in which jobs are requests for virtual machines (VMs) or containers that reserve resources for various services, and rewards represent service priority of requests or price paid per time unit of service. We study this problem in an asymptotic regime where the number of servers and jobs’ arrival rates scale by a factor L, as L becomes large. We propose a resource reservation policy that asymptotically achieves at least 1/2, and under certain monotone property on jobs’ rewards and resources, at least [Formula: see text] of the optimal expected reward. The policy automatically scales the number of VM slots for each job type as the demand changes and decides in which servers the slots should be created in advance, without the knowledge of traffic rates.

Highlights

There has been a rapid migration of computing, storage, applications, and other services to cloud
We start by choosing the virtual machines (VMs) types considering the VM instances offered by major cloud providers like Google Cloud are mainly optimized for either memory, CPU, or regular use
We proposed a VM reservation and admission policy that operates in an online manner and can guarantee at least 1/2 of the optimal expected reward

Summary

Introduction

There has been a rapid migration of computing, storage, applications, and other services to cloud. A key challenge for the cloud service providers is to efficiently support a wide range of services on their physical platform They usually offer quality of service (QoS) guarantees (in service level agreements) (Amazon Web Services 2020d) for clients’ applications and services and allow the number of VM instances to scale up or down with demand to ensure QoS guarantees are met. Various predictive and reactive schemes have been proposed for dynamically allocating VMs to different services (Mao et al 2010, Roy et al 2011, Han et al 2012, Jiang et al 2013, Ghobaei-Arani et al 2018, Qu et al 2018); they mostly assume a dedicated hosting model where VMs of each application run on a dedicated set of servers Such models do not consider potential consolidation of VMs in servers that is known to significantly improve efficiency and scalability (Song et al 2013, Corradi et al 2014).

Results

Discussion

Conclusion