This work proposed a novel reinforcement learning (RL) approach deep deterministic policy gradient with multi-stabilization network (MADDPG-MSN). It enhanced the sample efficiency of RL in heating, ventilation, and air conditioning (HVAC) power consumption optimization. Employing the multi-stabilization network trick, MADDPG-MSN efficiently learned to balance temperature control and power consumption with a limited number of interactions, expanding the potential of RL in the power consumption optimization of real-world HVAC systems. Evaluated by the simulated data center scenario, it reduced 28% power usage without compromising temperature control capability compared with the traditional model-predictive controller. In the real-world air conditioner testing, it demonstrated a similar control performance of indoor temperature to the built-in controller while consuming 32% less power.
Read full abstract