This paper presents a model-predictive control-based scheduling strategy called ThermoRing to reduce cooling costs in data centers. ThermoRing makes use of an online feedback control mechanism to improve thermal management of energy-efficient clusters in a data center. ThermoRing aims at keeping the maximum inlet temperatures of the nodes under a redline temperature limit with little stability errors. Importantly, the ThermoRing approach is capable of dealing with emergency conditions (e.g., node fan shutdown and unexpected rising task arrival rates) by dynamically balancing load among the nodes. ThermoRing incorporates a heat distribution matrix to model the thermal characteristics of a data center housing cluster. ThermoRing is conducive to thermal management in data centers with high-scheduling performance and stability. Using a real-world online bookstore trace, we conduct extensive experiments to compare the performance of ThermoRing with three existing solutions (i.e., C-Oracle, Ad-hoc, and MinHR). The experimental results show that ThermoRing improved the system throughput by more than 10% under regular load conditions and by 40% in emergency cases. ThermoRing also significantly improves the energy efficiency of MinHR, which is a thermal-aware scheduler.
Read full abstract