Online Performance Model Learning to Minimize Performance Interference in Cloud Computing Infrastructure

Hamzah Abdel-Aziz ,Michael Walker ,Faruk Caglar ,Xenofon Koutsoukos ,Shashank Shekhar ,Anirüddhā Gokhālé

doi:10.1109/hipcw.2015.23

Abstract

Cloud service providers use virtualization to enable hosting multiple applications in a single server such that each application has its own configuration and allocated resources to fulfill their application-specific demands and requirements. Furthermore, service providers use resource overbooking to increase the utilization of the servers and therefore to increase their profit. However, the primary drawback in resource overbooking is performance interference between the hosted virtual machine (VM) collocated in the same physical host. Performance interference significantly affects application performance and its other quality-of-service (QoS) properties.The performance interference level depends on the type of collocated workloads and their corresponding collocated resources. For example, collocating multiple VMs all with memory intensive workloads in the same physical platform (host) can lead to a high cache miss ratio rate because of their high demand for memory access. Thus, the performance interference can be minimized by introducing a smart placement or migration strategy for VMs based on their workload types such that VMs whose workloads impose maximum demands for mutually distinct resources are placed together.To address these challenges requires a runtime performance model of the system so that runtime decisions on VM placement and migration can be made by a controller that incorporates the model. In this work, we propose a model=based data-driven approach that abstracts the runtime behavior and characteristics of different collocated workloads, which is known to impact the performance interference level. Recent research efforts have applied big data analytics methodologies to analyze and model the cloud infrastructure so that businesses can utilize autonomous machinebased decision making solutions. To that end, our approach uses a machine learning algorithm to learn the online performance model of the collocated workloads based on measured data to provide real time predictive analysis of performance interference. Moreover, our approach relearns or updates the predictive model online in order to overcome run-time VM workload changes.Our approach consists of three main steps that run repeatedly. First, we use a Gaussian Process (GP) model as a data-driven machine-learning approach to learn the workload performance model from the aggregated data. Second, the GP predictive model is then used to predict the workload performance in each host. Lastly, a machine-based VM placement/migration decision is made to minimize the system cost represented in the interference level. Currently our work targets only a homogenous cloud infrastructure that hosts different VMs with different workload characteristics, e.g., varying CPU, Memory, and/or network workloads.Our ongoing work is implementing the solution in a local cloud infrastructure consisting of five homogenous hosts. We aggregate the average of the workload data and its corresponding performance level from each host every 15 minutes. The workload model depends mainly on the data aggregated from the running VMs, such as its memory utilization value. On the other hand, the performance interference level depends primarily on the shared platform measurements; for example the host's cache miss ratio. We use both VM and host measurements to learn our performance model.

Full Text