Latency-critical applications running in modern datacenters exhibit irregular request arrival patterns and are implemented using multiple services with strict latency requirements (30–250μs). These characteristics render existing energy-saving idle CPU sleep states ineffective due to the performance overhead caused by the state’s transition latency. Besides the state transition latency, another important contributor to the performance overhead of sleep states is the cold-start latency, or in other words, the time required to warm up the microarchitectural state (e.g., cache contents, branch predictor metadata) that is flushed or discarded when transitioning to a lower-power state. Both the transition latency and cold-start latency can be particularly detrimental to the performance of latency critical applications with short execution times. While prior work focuses on mitigating the effects of transition and cold-start latency by optimizing request scheduling, in this work we propose a redesign of the core C-state architecture for latency-critical applications. In particular, we introduce C6Awarm, a new Agile core C-state that drastically reduces the performance overhead caused by idle sleep state transition latency and cold-start latency while maintaining significant energy savings. C6Awarm achieves its goals by (1) implementing medium-grained power gating, (2) preserving the microarchitectural state of the core, and (3) keeping the clock generator and PLL active and locked. Our analysis for a set of microservices based on an Intel Skylake server shows that C6Awarm manages to reduce the energy consumption by up to 70% with limited performance degradation (at most 2%).
Read full abstract