Tales of the Tail

Naveen Kr. Sharma,Dan R. K. Ports,Jialin Li,Steven D. Gribble

doi:10.1145/2670979.2670988

Abstract

Interactive services often have large-scale parallel implementations. To deliver fast responses, the median and tail latencies of a service's components must be low. In this paper, we explore the hardware, OS, and application-level sources of poor tail latency in high throughput servers executing on multi-core machines.We model these network services as a queuing system in order to establish the best-achievable latency distribution. Using fine-grained measurements of three different servers (a null RPC service, Memcached, and Nginx) on Linux, we then explore why these servers exhibit significantly worse tail latencies than queuing models alone predict. The underlying causes include interference from background processes, request re-ordering caused by poor scheduling or constrained concurrency models, suboptimal interrupt routing, CPU power saving mechanisms, and NUMA effects.We systematically eliminate these factors and show that Memcached can achieve a median latency of 11 μs and a 99.9th percentile latency of 32 μs at 80% utilization on a four-core system. In comparison, a naive deployment of Memcached at the same utilization on a single-core system has a median latency of 100 μs and a 99.9th percentile latency of 5 ms. Finally, we demonstrate that tradeoffs exist between throughput, energy, and tail latency.

Full Text