Abstract

Cloud service providers improve resource utilization by co-locating latency-critical (LC) workloads with best-effort batch (BE) jobs in datacenters. However, they usually treat an LC workload as a whole when allocating resources to BE jobs and neglect the different features of components of an LC workload. This kind of coarse-grained co-location method leaves a significant room for improvement in resource utilization. Based on the observation of the inconsistent interference tolerance abilities of different LC components, we propose a new abstraction called Servpod, which is a collection of a LC parts that are deployed on the same physical machine together, and show its merits on building a fine-grained co-location framework. The key idea is to differentiate the BE throughput launched with each LC Servpod, i.e., Servpod with high interference tolerance ability can be deployed along with more BE jobs. Based on Servpods, we present Rhythm, a co-location controller that maximizes the resource utilization while guaranteeing LC service's tail latency requirement. It quantifies the interference tolerance ability of each servpod through the analysis of tail-latency contribution. We evaluate Rhythm using LC services in forms of containerized processes and microservices, and find that it can improve the system throughput by 31.7%, CPU utilization by 26.2%, and memory bandwidth utilization by 34% while guaranteeing the SLA (service level agreement).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call