Latency critical applications running in modern datacenters exhibit irregular request arrival patterns and are implemented using multiple services with strict latency requirements (30 μs –250 μs ). These characteristics render existing energy saving idle CPU sleep states ineffective due to the performance overhead caused by the state’s transition latency. Besides the state transition latency, another important contributor to the performance overhead of sleep states is the cold-start latency, or in other words, the time required to warm-up microarchitectural state (e.g., cache contents, branch predictor metadata) that is flushed or discarded when transitioning to a lower-power state. Both the transition latency and cold-start latency can be particularly detrimental to the performance of latency critical applications with short execution times. While prior work focuses on mitigating the effects of transition and cold-start latency by optimizing request scheduling, in this work, we propose a redesign of the Core C-state architecture for latency-critical applications. In particular, we introduce C6Awarm a new Agile Core C-state that drastically reduces the performance overhead caused by idle sleep state transition latency and cold-start latency, while maintaining significant energy savings. C6Awarm achieves its goals by implementing 1) medium-grained power gating, 2) preserving the microarchitectural state of the core and 3) by keeping the clock generator and PLL active and locked. Our analysis for a set of microservices based on an Intel Skylake server, shows that C6Awarm manages to reduce the energy consumption by up to \(70\% \) with limited performance degradation (at-most \(2\% \) ).
Read full abstract