The Queuing-First Approach for Tail Management of Interactive Services

Amirhossein Mirhosseini,Thomas F Wenisch

doi:10.1109/mm.2019.2897671

Abstract

Managing high-percentile tail latencies is key to designing user-facing cloud services. Rare system hiccups or unusual code paths make some requests take $\boldsymbol{10\;\times -100\;\times}$10×-100× longer than the average. Prior work seeks to reduce tail latency by trying to address primarily root causes of slow requests. However, often the bulk of requests comprising the tail are not these rare slow-to-execute requests. Rather, due to head-of-line blocking, most of the tail comprises requests enqueued behind slow-to-execute requests. Under high disparity service distributions, queuing effects drastically magnify the impact of rare system hiccups and can result in high tail latencies even under modest load. We demonstrate that improving the queuing behavior of a system often yields greater benefit than mitigating the individual system hiccups that increase service time tails. We suggest two general directions to improve system queuing behavior–server pooling and common-case service acceleration–and discuss circumstances where each is most beneficial.

Full Text