Transfer Learning Applied to Reinforcement Learning-Based HVAC Control

Paulo Lissa,Michael Schukat,Enda Barrett

doi:10.1007/s42979-020-00146-7

Abstract

Modern control solutions for HVAC have demonstrated excellent cost and energy savings through the utilisation of machine learning techniques. However, a challenging problem faced by most machine learning tasks is the amount of time and data required to train effective policies in the absence of prior knowledge. Considering that buildings from a specific geographical location share common environmental and structural features, this paper investigates the impact of spatial changes on performance accuracy through the use of transfer learning applied to reinforcement learning based HVAC control. We propose the development of an adapted RL (Q-learning) algorithm which can transfer HVAC control polices, adjusting themselves according to spatial changes. We examine the performance of our approach across multiple different locations. Moreover, an analysis of the user’s time out comfort has been made, comparing models with and without transfer learning. The results from different cases show that after applying transfer learning the learning time to train optimal or near-optimal control policies was reduced by more than a factor of 6 when comparing to experiments without it. Furthermore, the test case where the spatial variation was lower than 50% achieved a similar performance for both dynamic and static HVAC control, presenting an average time out comfort error of 2.55% and 3.83%, respectively. From the user’s perspective, it means they will not feel any additional discomfort, as the number of minutes out of the comfort zone for the static version is approximately the same for a 1-day interval. Finally, when examining the effect of transfer learning on geographical changes, the proposed method demonstrated higher performance in countries where the temperature variation is lower, reducing time out comfort by one-third. If an agent receives a policy from a place where the environmental conditions are very different the proposed method will still work and find the best policy, but not as fast as receiving it from a similar place.

Full Text