Multi-Tier Workload Consolidations in the Cloud: Profiling, Modeling and Optimization

Kejiang Ye,Cheng-Zhong Xu,Yang Wang,Haiying Shen

doi:10.1109/tcc.2020.2975788

Abstract

Reducing tail latency becomes increasingly important to improve the user-perceived service experience. User-facing latency-sensitive cloud applications typically contain multiple interactive tiers (e.g., Web, App, Database) running in different virtual machines (VMs) with complex interaction patterns. However, such interactions between VMs in different tiers are often neglected in previous VM consolidation methods, resulting in poor application performance. In this article, we study the consolidation of multi-tier interactive workloads from a new perspective of user-perceived tail latency. We propose a novel profiling-based consolidation methodology to satisfy tail latency requirements while reducing the number of used physical machines. To achieve such a goal, we first perform large-scale profiling experiments under various consolidation settings in a KVM virtualized private cluster to establish the empirical performance values. We consider two key factors that affect the tail latency of multi-tier workloads: <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">interference with co-located VMs and <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">interaction between tiers. We model the consolidation of multi-tier workloads as an optimization problem with different objectives and constraints, and derive the consolidation schedule. We implement and evaluate the proposed models, as well as comparing with other methods (i.e., <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without profiling or <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without considering interaction influence). Extensive experimental results show that the proposed method is able to reduce up to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">5X tail latency, compared with the method <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without profiling and up to <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">1.3X tail latency, compared with the method <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">without considering the interaction influence between different tiers.

Full Text