A Reinforcement Learning Approach for Whole Building Energy Model Assisted HVAC Supervisory Control

Zhiang Zhang

doi:10.1184/r1/9913883.v1

Abstract

Buildings account for a significant portion of the total energy consumption of many countries. Energy efficiency is one of the primary objectives of today’s building projects. Whole building energy model (BEM), a physics-based modeling method for building thermal and energy behaviors, is widely used by building designers to predict and improve the energy performance of building design. BEM also has potential for developing supervisory control strategies for heating, ventilation and airconditioning (HVAC) systems. The BEM-derived control strategies may significantly improve HVAC energy efficiency compared to the commonly-used rule-based control strategies. However, it is challenging to use BEM for HVAC control. This is because, firstly, BEM is a high-order model so classical model-based control methods cannot be directly applied. Heuristic search algorithms, such as genetic algorithm, are usually used for BEM-based control optimization. Secondly, BEM is computationally-intensive compared to other black-box or grey-box models, which limits its application for large-scale control optimization problems. Model-free reinforcement learning (RL) is an alternative method to use BEM for HVAC control. Model-free RL is a “trial-and-error” learning method that is applicable for any complex systems. As a result, BEM can be used as a simulator to train an RL agent offline to learn an energy efficient supervisory control strategy. However, reinforcement learning for HVAC control has notbeen adequately studied. Most existing studies are based on over-simplified HVAC systems and a limited number of experiment scenarios. This study develops a BEM-assisted reinforcement learning framework for HVAC supervisorycontrol for energy efficiency. The control framework uses a design-stage BEM to “learn” a control strategy via model-free RL. The RL agent is a neural network model which performs as a function approximator. Through computer simulations, the control framework is evaluated in differentscenarios covering four typical commercial HVAC systems, four climates, and two building thermal mass levels. The RL-trained control strategies are also evaluated for “versatility”, i.e., the tolerance for the variations of HVAC operational conditions. Multiple “perturbed” simulators are created forthis purpose, with varying weather conditions, occupancy and plug-load schedules, and indoor air temperature setpoint schedules. The control framework has achieved better-then-baseline control performance in a variable-airvolume(VAV) system (a common type of air-based secondary HVAC system) for both cooling and heating under different climates and building thermal mass levels. Compared to a baseline rule-based control strategy, the RL-trained strategies can achieve obvious energy-savings and less “setpoint notmet time” (i.e., the cumulative time that indoor air temperature setpoints are not met). Also, the RL-trained strategies can tolerate the variations in weather conditions and occupancy/plug-load schedules. However, the RL-trained control strategies have worse-than-baseline energy performance if indoor air temperature setpoint schedules are significantly changed. The control framework has also achieved reduced heating demand and improved-or-similar thermal comfort (compared to a baseline rule-based control) for a slow-response radiant heating system in all the experiment scenarios. The RL-trained strategies have also achieved improved control performance in different perturbed simulators. However, the reward function must include a specially designed heuristic to deal with the slow thermal response and the imperfect energy metric of this system. The heuristic encourages low supply water temperature setpoint values and reward increasing trends of the predicted mean vote (PMV) if it is below the setpoint. This indicates that the reward function design is crucial for the control performance of this control framework. Control performance may be poor if the reward function is over-complicated, as shown in the experiments related to a multi-chiller chilled water system. The reward function for this system consistsof three complicated penalty functions corresponding to three operational constraints, including the chiller cycling time, the chiller partial-load-ratio, and the system supply water temperature. The RLtrained control strategies have violated some operational constraints significantly, and only achieved a limited amount of energy savings. This thesis also studied the effects of the neural network model (the RL agent function approximator) complexity on the control and convergence performance of the control framework. It isfound that a complex neural network model does not necessarily lead to better control performancecompared to a simple neural network model. A complex neural network model may make the reinforcementlearning hard to converge. Thus, “deep” reinforcement learning is not always a suitable choice, even though it is a popular concept in recent literature. As a general guideline, this study recommends using a narrow and shallow non-linear neural network model for the control framework.In future work, the control framework should be evaluated in more scenarios, such as more types of HVAC systems and more climate zones. It is also necessary to conduct a more comprehensive versatility analysis for a trained RL control policy. Future work should also develop an adaptive RL control method that could self-adapt to the changing characteristics of an HVAC system. Last but not least, theoretical investigations are needed to guide the future development of the control framework.

Full Text