Abstract

Deep reinforcement learning (DRL) has shown good performance in tackling Markov decision process (MDP) problems. As DRL opti-mizes a long-term reward, it is a promising approach to improving the energy efficiency of data center cooling. However, enforcement of thermal safety constraint during DRL's state exploration is a main challenge. The widely adopted reward shaping approach adds negative reward when the exploratory action results in unsafety. Thus, it needs to experience sufficient unsafe states before it learns how to prevent unsafety. In this paper, we propose a safety-aware DRL framework for single-hall data center cooling control. It applies offline imitation learning and online post-hoc rectification to holis-tically prevent thermal unsafety during online DRL. In particular, the post-hoc rectification searches for the minimum modification to the DRL-recommended action such that the rectified action will not result in unsafety. The rectification is designed based on a thermal state transition model that is fitted using historical safe operation traces and able to extrapolate the transitions to unsafe states ex-plored by DRL. Extensive evaluation for chilled water and direct expansion cooled data centers in two climate conditions shows that our approach saves 22.7% to 26.6% total data center power compared with conventional control, reduces safety violations by 94.5% to 99% compared with reward shaping.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call