Deep Reinforcement Learning for room temperature control: a black-box pipeline from data to policies

L Di Natale,P Heer,B Svetozarevic,C N Jones

doi:10.1088/1742-6596/2042/1/012004

Abstract

Deep Reinforcement Learning (DRL) recently emerged as a possibility to control complex systems without the need to model them. However, since weeks long experiments are needed to assess the performance of a building controller, people still have to rely on accurate simulation environments to train and tune DRL agents in tractable amounts of time before deploying them, shifting the burden back to the original issue of designing complex models. In this work, we show that it is possible to learn control policies on simple black-box linear room temperature models, thereby alleviating the heavy engineering usually required to build accurate surrogates. We develop a black-box pipeline, where historical data is taken as input to produce room temperature control policies. The trained DRL agents are capable of beating industrial rule-based controllers both in terms of energy consumption and comfort satisfaction, using novel penalties to introduce expert knowledge, i.e. to incentivize agents to follow expected behaviors, in the reward function. Moreover, one of the best agents was deployed on a real building for one week and was able to save energy while maintaining adequate comfort levels, indicating that low-complexity models might be enough to learn control policies that perform well on real buildings.

Highlights

Today, most buildings are still controlled using heuristic rules, which are known to be suboptimal in terms of energy savings and occupant comfort satisfaction
Since weeks long experiments are needed to assess the performance of a building controller, people still have to rely on accurate simulation environments to train and tune Deep Reinforcement Learning (DRL) agents in tractable amounts of time before deploying them, shifting the burden back to the original issue of designing complex models
Main contributions To alleviate the required model engineering, we develop a black-box pipeline from historical data to room temperature control policies1, similar to [7], with two key contributions: (i) Using linear models to mitigate extrapolation errors of black-box models and a novel reward function, we learn control policies that beat rule-based controllers in simulation both in terms of energy consumption and comfort satisfaction. (ii) One of the best agents was deployed on the real building during a week and it performed better than a rule-based counterpart, saving energy and maintaining adequate comfort levels, demonstrating that the learned policy is effective in simulation

Summary

Introduction

Most buildings are still controlled using heuristic rules, which are known to be suboptimal in terms of energy savings and occupant comfort satisfaction. MPC relies on accurate models to find the optimal control input at each time step. Such models are hard to derive for buildings due to their complex and highly nonlinear dynamics, which leads to high development costs. Researchers took advantage of available data to construct black-box models to use in MPC, avoiding the complex physics-based modelling of building dynamics, like in [3]. These models might not follow the laws of physics and induce complex optimization routines for MPC. Accurate building models are needed to develop high performance predictive controllers, but they require significant expertise during the design phase

Objectives

Results

Discussion

Conclusion