Faster or Cheaper: A Q-learning based cost-effective mixed cluster scaling method for achieving low tail latencies

Hao Yang,Li Pan,Shijun Liu

doi:10.1016/j.future.2024.03.055

Abstract

The cloud elasticity allows users to acquire resources and release useless resources as needed. This feature has attracted more and more web service providers to deploy their latency-crucial, user-oriented applications on cloud platforms. For web service providers, in the case of fluctuating workloads, scaling their server clusters over time could save system expenditures without service quality violations. Therefore, a lot of cloud platforms begin to offer automatic scaling strategies based on threshold-based rules for helping web service providers to save system expenditures. However, building threshold-based rules requires expertise, and such reactive scaling strategies cannot guarantee low and consistent tail latencies. For those proactive scaling strategies depending on predictions, random user behaviors lead to declines in prediction accuracy. In this paper, we propose a reinforcement learning based proactive strategy for scaling a mixed cluster, which is composed of a variety of cloud instances. Ensuring the availability and quality of reward signals is the main problem to be solved for algorithms based on standard RL. We design a reward function which could balance service cost, service quality and other parameters which can affect decision-making. We assign different weights to parameters based on their effects on decision-making. In order to avoid the explosion of the state space caused by fluctuating workloads and various server status, we discretize the continuous state space of our model. Experimental results based on TailBench show that our Q-learning based scaling method can maintain low and consistent tail latencies while achieving fewer costs than three common baselines.

Full Text