Abstract

How do we optimize heating, ventilation and air-conditioning (HVAC) systems for energy cost and occupant comfort? How do we accomplish this in an automated fashion that adapts with time with minimal human intervention? Answers to these questions have tremendous impact on building occupant comfort, building operating costs and, importantly, environmental footprint. Understandably, this topic has received considerable attention from experts both in the industry and the academia. Among these works, deep learning, specifically deep reinforcement learning (DRL) is emerging as a data-driven control strategy without requiring an explicit dynamic model of the system. Another advantage of DRL, when successfully developed, is that it can continue to learn and adapt as the building/HVAC characteristics change with time. DRL agents, however, are challenging to train. On one hand, they may need months or years of training data (sample inefficiency), potentially inconveniencing building occupants and incurring high energy costs for a long time. On the other hand, they may converge to local optima or simply do not converge. This paper highlights one strategy to mitigate some of these challenges. We show that the choice of state and action space is as important as the choice of DRL architectures and neural network training techniques. Specifically, we formulate the HVAC control problem as a Partially Observable Markov Decision Process (POMDP), build a DRL agent using Deep Q Networks (DQN) on a building simulator, and quantify gains over a widely adopted baseline heuristic method. Subsequently, we reformulate the original problem as a restricted POMDP by severely restricting the observation (state space) and action space, and build a DRL agent for the restricted POMDP. The performance gains from this DRL agent is double that of the original agent, implying that complex state and action spaces, while information rich, can lead to complex loss functions that could not be maneuvered well by a DRL agent. Our larger message is this: 'less' (state-action space) can be 'more', in the context of DRL training.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call