Safe Reinforcement Learning for Buildings: Minimizing Energy Use While Maximizing Occupant Comfort
With buildings accounting for 40% of global energy consumption, heating, ventilation, and air conditioning (HVAC) systems represent the single largest opportunity for emissions reduction, consuming up to 60% of commercial building energy while maintaining occupant comfort. This critical balance between energy efficiency and human comfort has traditionally relied on rule-based and model predictive control strategies. Given the multi-objective nature and complexity of modern HVAC systems, these approaches fall short in satisfying both objectives. Recently, reinforcement learning (RL) has emerged as a method capable of learning optimal control policies directly from system interactions without requiring explicit models. However, standard RL approaches frequently violate comfort constraints during exploration, making them unsuitable for real-world deployment where occupant comfort cannot be compromised. This paper addresses two fundamental challenges in HVAC control: the difficulty of constrained optimization in RL and the challenge of defining appropriate comfort constraints across diverse conditions. We adopt a safe RL with a neural barrier certificate framework that (1) transforms the constrained HVAC problem into an unconstrained optimization and (2) constructs these certificates in a data-driven manner using neural networks, adapting to building-specific comfort patterns without manual threshold setting. This approach enables the agent to almost guarantee solutions that improve energy efficiency and ensure defined comfort limits. We validate our approach through seven experiments spanning residential and commercial buildings, from single-zone heat pump control to five-zone variable air volume (VAV) systems. Our safe RL framework achieves energy reduction compared to baseline operation while maintaining higher comfort compliance than unconstrained RL. The data-driven barrier construction discovers building-specific comfort patterns, enabling context-aware optimization impossible with fixed thresholds. While neural approximation prevents absolute safety guarantees, reducing catastrophic safety failures compared to unconstrained RL while maintaining adaptability positions this approach as a developmental bridge between RL theory and real-world building automation, though the considerable gap in both safety and energy performance relative to rule-based control indicates the method requires substantial improvement for practical deployment.
- # Heating, Ventilation, And Air Conditioning
- # Reinforcement Learning
- # Heating, Ventilation, And Air Conditioning Control
- # Maintaining Occupant Comfort
- # Safe Reinforcement Learning
- # Occupant Comfort
- # Variable Air Volume
- # Minimizing Energy Use
- # Reinforcement Learning Theory
- # Model Predictive Control Strategies
16
- 10.1109/acc.2008.4586695
- Jun 1, 2008
- 10.1002/wene.70012
- Jul 24, 2025
- WIREs Energy and Environment
33
- 10.1109/tcst.2021.3057630
- Jan 1, 2022
- IEEE Transactions on Control Systems Technology
188
- 10.1016/j.buildenv.2019.106351
- Aug 23, 2019
- Building and Environment
111
- 10.1016/j.conengprac.2019.104211
- Nov 11, 2019
- Control Engineering Practice
12
- 10.1016/j.enbuild.2022.111898
- Jan 25, 2022
- Energy and Buildings
170
- 10.1016/j.egyai.2020.100020
- Aug 5, 2020
- Energy and AI
192
- 10.1016/j.eswa.2008.05.031
- May 15, 2008
- Expert Systems with Applications
13
- 10.1109/tsusc.2023.3251302
- Jul 1, 2023
- IEEE Transactions on Sustainable Computing
94
- 10.1137/20m131309x
- Jan 1, 2021
- SIAM Journal on Mathematics of Data Science
- Conference Article
7
- 10.23919/icue-gesd.2018.8635595
- Oct 1, 2018
Energy savings and occupant thermal comfort are the two most important factors in controlling heating ventilation and air conditioning (HVAC) operation in buildings. Typically, it is found that thermal comfort is not always met in buildings. Hence, there is still an opportunity to improve indoor thermal comfort, and at the same time save energy by controlling HVAC set points. The objective of this paper is to propose a method to obtain energy savings by adjusting HVAC set points based on occupant comfort measured using Predicted Mean Vote (PMV) and occupancy information. The idea is to calculate hourly PMV values based on real-time occupancy information, indoor temperature set points and humidity in a building. Then, a new set of temperature set points that can maintain occupant comfort, i.e., PMV = 0, is derived. To evaluate the effectiveness of the proposed method, a building model is developed in eQUEST using the information from a real-world building located in Alexandria, VA. Research findings indicate that HVAC electrical consumption savings of 14.58% is achieved when the proposed set point adjustment method is implemented as compared to that of the base case. To study the impact of adding occupancy information on HVAC energy savings, another scenario is simulated where HVAC set point is increased when the building is unoccupied, e.g., during lunchtime or holidays. Research findings indicate that additional HVAC electrical consumption savings of 8.79% is achieved when taking into account occupancy information in HVAC control.
- Dissertation
- 10.1184/r1/9913883.v1
- Oct 2, 2019
Buildings account for a significant portion of the total energy consumption of many countries. Energy efficiency is one of the primary objectives of today’s building projects. Whole building energy model (BEM), a physics-based modeling method for building thermal and energy behaviors, is widely used by building designers to predict and improve the energy performance of building design. BEM also has potential for developing supervisory control strategies for heating, ventilation and airconditioning (HVAC) systems. The BEM-derived control strategies may significantly improve HVAC energy efficiency compared to the commonly-used rule-based control strategies. However, it is challenging to use BEM for HVAC control. This is because, firstly, BEM is a high-order model so classical model-based control methods cannot be directly applied. Heuristic search algorithms, such as genetic algorithm, are usually used for BEM-based control optimization. Secondly, BEM is computationally-intensive compared to other black-box or grey-box models, which limits its application for large-scale control optimization problems. Model-free reinforcement learning (RL) is an alternative method to use BEM for HVAC control. Model-free RL is a “trial-and-error” learning method that is applicable for any complex systems. As a result, BEM can be used as a simulator to train an RL agent offline to learn an energy efficient supervisory control strategy. However, reinforcement learning for HVAC control has notbeen adequately studied. Most existing studies are based on over-simplified HVAC systems and a limited number of experiment scenarios. This study develops a BEM-assisted reinforcement learning framework for HVAC supervisorycontrol for energy efficiency. The control framework uses a design-stage BEM to “learn” a control strategy via model-free RL. The RL agent is a neural network model which performs as a function approximator. Through computer simulations, the control framework is evaluated in differentscenarios covering four typical commercial HVAC systems, four climates, and two building thermal mass levels. The RL-trained control strategies are also evaluated for “versatility”, i.e., the tolerance for the variations of HVAC operational conditions. Multiple “perturbed” simulators are created forthis purpose, with varying weather conditions, occupancy and plug-load schedules, and indoor air temperature setpoint schedules. The control framework has achieved better-then-baseline control performance in a variable-airvolume(VAV) system (a common type of air-based secondary HVAC system) for both cooling and heating under different climates and building thermal mass levels. Compared to a baseline rule-based control strategy, the RL-trained strategies can achieve obvious energy-savings and less “setpoint notmet time” (i.e., the cumulative time that indoor air temperature setpoints are not met). Also, the RL-trained strategies can tolerate the variations in weather conditions and occupancy/plug-load schedules. However, the RL-trained control strategies have worse-than-baseline energy performance if indoor air temperature setpoint schedules are significantly changed. The control framework has also achieved reduced heating demand and improved-or-similar thermal comfort (compared to a baseline rule-based control) for a slow-response radiant heating system in all the experiment scenarios. The RL-trained strategies have also achieved improved control performance in different perturbed simulators. However, the reward function must include a specially designed heuristic to deal with the slow thermal response and the imperfect energy metric of this system. The heuristic encourages low supply water temperature setpoint values and reward increasing trends of the predicted mean vote (PMV) if it is below the setpoint. This indicates that the reward function design is crucial for the control performance of this control framework. Control performance may be poor if the reward function is over-complicated, as shown in the experiments related to a multi-chiller chilled water system. The reward function for this system consistsof three complicated penalty functions corresponding to three operational constraints, including the chiller cycling time, the chiller partial-load-ratio, and the system supply water temperature. The RLtrained control strategies have violated some operational constraints significantly, and only achieved a limited amount of energy savings. This thesis also studied the effects of the neural network model (the RL agent function approximator) complexity on the control and convergence performance of the control framework. It isfound that a complex neural network model does not necessarily lead to better control performancecompared to a simple neural network model. A complex neural network model may make the reinforcementlearning hard to converge. Thus, “deep” reinforcement learning is not always a suitable choice, even though it is a popular concept in recent literature. As a general guideline, this study recommends using a narrow and shallow non-linear neural network model for the control framework.In future work, the control framework should be evaluated in more scenarios, such as more types of HVAC systems and more climate zones. It is also necessary to conduct a more comprehensive versatility analysis for a trained RL control policy. Future work should also develop an adaptive RL control method that could self-adapt to the changing characteristics of an HVAC system. Last but not least, theoretical investigations are needed to guide the future development of the control framework.
- Book Chapter
2
- 10.5772/18818
- Sep 6, 2011
The heating, ventilating, and air-conditioning (HVAC) systems have huge different characteristics in control engineering from chemical and steel processes. One of the characteristics is that the equilibrium point (or the operating point) usually varies with disturbances such as outdoor temperature (or weather conditions) and thermal loads. The variations of the operating point intend to vary parameters of a plant model. Thus, the HVAC control systems are extremely difficult to obtain an exact mathematical model (Kasahara 2000). Proportional-plus-integral (PI) controllers have been by far the most common control strategy as the complexity of the control problem increased (Astrom 1995). Today, a variable air volume (VAV) system has been universally accepted as means of achieving energy efficient and comfortable building environment. While the VAV control strategies provide a high quality environment for building occupants, the VAV system analysis rarely receives the attention it deserves. As a result, basic control strategies for the VAV system have remained unchanged up to now (Hartman 2003). In addition, applying the model predictive control method to the HVAC systems, the control performance has been highly improved by pursuing the deviation from the operating point (Taira 2004). According to this report, recognizing the deviation from the operating point and calculating the optimal control inputs about the newly obtained operating point on next sampling time, the control system gives better responses than the traditional feedback control system. Motivated by these considerations in these reports, we consider the room temperature and humidity controls using the adjustable resets which compensate for thermal loads upsets. One of the primary objectives of the HVAC systems is to maintain the room air temperature and humidity at the setpoint values to a high quality environment for building occupants. The room temperature and humidity control systems may be represented in the same blockdiagram form as single-variable, single-loop feedback control systems because this interaction is weak relative to the desired control performance.
- Research Article
24
- 10.1016/j.buildenv.2023.111069
- Nov 28, 2023
- Building and Environment
Development of an HVAC system control method using weather forecasting data with deep reinforcement learning algorithms
- Research Article
14
- 10.1016/j.enbuild.2021.110995
- Apr 8, 2021
- Energy and Buildings
What are the impacts on the HVAC system when it provides frequency regulation? – A comprehensive case study with a Multi-Zone variable air volume (VAV) system
- Research Article
13
- 10.1186/s42162-018-0064-9
- Dec 1, 2018
- Energy Informatics
Heating, Ventilation and Air Conditioning (HVAC) consumes a significant fraction of energy in commercial buildings. Hence, the use of optimization techniques to reduce HVAC energy consumption has been widely studied. Model predictive control (MPC) is one state of the art optimization technique for HVAC control which converts the control problem to a sequence of optimization problems, each over a finite time horizon. In a typical MPC, future system state is estimated from a model using predictions of model inputs, such as building occupancy and outside air temperature. Consequently, as prediction accuracy deteriorates, MPC performance–in terms of occupant comfort and building energy use–degrades. In this work, we use a custom-built building thermal simulator to systematically investigate the impact of occupancy prediction errors on occupant comfort and energy consumption. Our analysis shows that in our test building, as occupancy prediction error increases from 5 to 20% the performance of an MPC-based HVAC controller becomes worse than that of even a simple static schedule. However, when combined with a personal environmental control (PEC) system, HVAC controllers are considerably more robust to prediction errors. Thus, we quantify the effectiveness of PECs in mitigating the impact of forecast errors on MPC control for HVAC systems.
- Research Article
170
- 10.1016/j.egyai.2020.100020
- Aug 5, 2020
- Energy and AI
Reinforcement learning for whole-building HVAC control and demand response
- Conference Article
4
- 10.1109/sustech.2017.8333531
- Nov 1, 2017
Building energy efficiency improvements are often achieved by using a building automation system (BAS) to control the environmental and comfort conditions in a building, especially the climate (i.e., temperature, humidity, etc.). The traditional method for achieving energy savings through temperature control is by scheduling the heating, ventilation and air conditioning (HVAC) units using the BAS. Care must be taken to ensure that energy savings are achieved without sacrificing occupant comfort, but validating comfort is a challenge, and metrics for comfort are generally chosen ad hoc in the development of HVAC controls. This paper explores HVAC control methods and the potential impact they may have on comfort; approaches to quantifying comfort; and research directions that may provide solutions to the comfort problem.
- Conference Article
2
- 10.1115/es2017-3105
- Jun 26, 2017
Advanced energy management control systems (EMCS), or building automation systems (BAS), offer an excellent means of reducing energy consumption in heating, ventilating, and air conditioning (HVAC) systems while maintaining and improving indoor environmental conditions. This can be achieved through the use of computational intelligence and optimization. This paper evaluates model-based optimization processes (OP) for HVAC systems utilizing any computer algebra system (CAS), genetic algorithms and self-learning or self-tuning models (STM), which minimizes the error between measured and predicted performance data. The OP can be integrated into the EMCS to perform several intelligent functions achieving optimal system performance. The development of several self-learning HVAC models and optimizing the process (minimizing energy use) is tested using data collected from an actual HVAC system. Using this optimization process (OP), the optimal variable set points (OVSP), such as supply air temperature (Ts), supply duct static pressure (Ps), chilled water supply temperature (Tw), minimum outdoor ventilation, and chilled water differential pressure set-point (Dpw) are optimized with respect to energy use of the HVAC’s cooling side including the chiller, pump, and fan. The optimized set point variables minimize energy use and maintain thermal comfort incorporating ASHRAE’s new ventilation standard 62.1-2013. This research focuses primarily with: on-line, self-tuning, optimization process (OLSTOP); HVAC design principles; and control strategies within a building automation system (BAS) controller. The HVAC controller will achieve the lowest energy consumption of the cooling side while maintaining occupant comfort by performing and prioritizing the appropriate actions. The program’s algorithms analyze multiple variables (humidity, pressure, temperature, CO2, etc.) simultaneously at key locations throughout the HVAC system (pumps, cooling coil, chiller, fan, etc.) to reach the function’s objective, which is the lowest energy consumption while maintaining occupancy comfort.
- Research Article
42
- 10.1016/j.jobe.2016.04.005
- Apr 20, 2016
- Journal of Building Engineering
An integrated control-oriented modelling for HVAC performance benchmarking
- Research Article
- 10.47363/jbber/2023(1)115
- Mar 30, 2023
- Journal of Biosensors and Bioelectronics Research
This comprehensive research paper investigates the paradigm shift in Heating, Ventilation, and Air Conditioning (HVAC) control systems propelled by the transformative integration of the Internet of Things (IoT). In an era marked by the convergence of digital technologies, the infusion of IoT into HVAC systems heralds a new era of dynamic, interconnected control mechanisms. This study undertakes a thorough examination of the evolving landscape, shedding light on the profound advancements, discernible benefits, and nuanced challenges intrinsic to harnessing IoT for the augmentation of HVAC control systems. The journey begins by elucidating the fundamental shifts catalyzed by the assimilation of IoT in HVAC systems. The traditional boundaries of HVAC control are transcended as interconnected devices seamlessly communicate, fostering an environment where each component becomes an intelligent node in a networked ecosystem. Real-time data exchange becomes the bedrock, facilitating a level of monitoring and control hitherto unseen. The paper explores the intricacies of this interconnectedness, unveiling the potential for granular control and adaptability that IoT ushers into HVAC operations. A focal point of this research is the exploration of the tangible benefits that arise from this symbiosis of IoT and HVAC control. The paper meticulously examines how real-time monitoring empowers system operators with unprecedented insights into performance metrics, energy consumption patterns, and environmental conditions. Harnessing this wealth of data, IoT-equipped HVAC systems demonstrate an unparalleled capacity for adaptive control, responding dynamically to fluctuating demands and external variables. The consequential improvements in energy efficiency and resource utilization contribute not only to operational cost savings but also align with global sustainability objectives. However, in the pursuit of technological advancement, challenges inevitably emerge. This research critically evaluates the impediments and challenges inherent in the integration of IoT into HVAC control systems. Security concerns, data privacy issues, and the evolving landscape of technology standards are among the multifaceted challenges explored in depth. The paper endeavors to provide a nuanced understanding of these challenges, offering insights that can inform the development of robust and resilient IoT-enabled HVAC control systems
- Conference Article
123
- 10.1145/3360322.3360849
- Nov 13, 2019
Reinforcement learning (RL) was first demonstrated to be a feasible approach to controlling heating, ventilation, and air conditioning (HVAC) systems more than a decade ago. However, there has been limited progress towards a practical and scalable RL solution for HVAC control. While one can train an RL agent in simulation, it is not cost-effective to create a model for each thermal zone or building. Likewise, existing RL agents generally take a long time to learn and are opaque to expert interrogation, making them unattractive for real-world deployment. To tackle these challenges, we propose Gnu-RL: a novel approach that enables practical deployment of RL for HVAC control and requires no prior information other than historical data from existing HVAC controllers. To achieve this, Gnu-RL adopts a recently-developed Differentiable Model Predictive Control (MPC) policy, which encodes domain knowledge on planning and system dynamics, making it both sample-efficient and interpretable. Prior to any interaction with the environment, a Gnu-RL agent is pre-trained on historical data using imitation learning, which enables it to match the behavior of the existing controller. Once it is put in charge of controlling the environment, the agent continues to improve its policy end-to-end, using a policy gradient algorithm. We evaluate Gnu-RL on both an EnergyPlus model and a real-world testbed. In both experiments, our agents were directly deployed in the environment after offline pre-training on expert demonstration. In the simulation experiment, our approach saved 6.6% energy compared to the best published RL result for the same environment, while maintaining a higher level of occupant comfort. Next, Gnu-RL was deployed to control the HVAC of a real-world conference room for a three-week period. Our results show that Gnu-RL saved 16.7% of cooling demand compared to the existing controller and tracked temperature set-point better.
- Research Article
12
- 10.1145/3393666
- May 23, 2020
- ACM Transactions on Design Automation of Electronic Systems
Heating ventilation and air conditioning (HVAC) systems usually account for the highest percentage of overall energy usage in large-sized smart building infrastructures. The performance of HVAC control systems for large buildings strongly depend on the outside environment, building architecture, and (thermal) zone usage pattern of the building. In large buildings, HVAC system with multiple air handling units (AHUs) is required to fulfill the cooling/heating requirements. In the present work, we propose an energy-aware building resource allocation and economic model predictive control (eMPC) framework for multi-AHU-based HVAC system. The energy consumption of a multi-AHU-based HVAC system significantly depends on how long the AHUs are running, which again is governed by the zone usage demands. Our approach comprises a two-step hierarchical technique where we first minimize the running time of AHUs by suitably allocating building resources (thermal zones) to usage demands for zones. Next, we formulate a finite receding horizon control problem for trading off energy consumption against thermal comfort during HVAC operations. Given a high-level building specification and usage demand, our computer-aided design framework generates building thermal models, allocates usage demands, formulates the control scheme, and simulates it to generate power consumption statistics for the given building with usage demands. We believe that the proposed framework will help in early analysis during the design phase of energy-aware building architecture and HVAC control. The framework can also be useful from a building operator point of view for energy-aware HVAC control as well as for satisfying smart grid demand-response events by HVAC system peak power reduction through automated control actions.
- Research Article
41
- 10.3390/en13205396
- Oct 15, 2020
- Energies
Occupancy-aware heating, ventilation, and air conditioning (HVAC) control offers the opportunity to reduce energy use without sacrificing thermal comfort. Residential HVAC systems often use manually-adjusted or constant setpoint temperatures, which heat and cool the house regardless of whether it is needed. By incorporating occupancy-awareness into HVAC control, heating and cooling can be used for only those time periods it is needed. Yet, bringing this technology to fruition is dependent on accurately predicting occupancy. Non-probabilistic prediction models offer an opportunity to use collected occupancy data to predict future occupancy profiles. Smart devices, such as a connected thermostat, which already include occupancy sensors, can be used to provide a continually growing collection of data that can then be harnessed for short-term occupancy prediction by compiling and creating a binary occupancy prediction. Real occupancy data from six homes located in Colorado is analyzed and investigated using this occupancy prediction model. Results show that non-probabilistic occupancy models in combination with occupancy sensors can be combined to provide a hybrid HVAC control with savings on average of 5.0% and without degradation of thermal comfort. Model predictive control provides further opportunities, with the ability to adjust the relative importance between thermal comfort and energy savings to achieve savings between 1% and 13.3% depending on the relative weighting between thermal comfort and energy savings. In all cases, occupancy prediction allows the opportunity for a more intelligent and optimized strategy to residential HVAC control.
- Research Article
9
- 10.1177/1420326x14540314
- Jun 24, 2014
- Indoor and Built Environment
A heating, ventilation, and air-conditioning (HVAC) system is a multi-variable strongly coupled large-scale system that is composed of several sub-systems. Considerable research, simulations, and experiments have been conducted on HVAC control. The optimization control of an HVAC system is now the popular issue. The ultimate goal of this paper was to achieve minimum energy consumption and improve system efficiency. Multi-zone variable air volume and variable water volume air-conditioning systems were developed. The dynamic models of HVAC sub-systems were built by the adaptive directional forgetting method. Control strategies such as the gearshift integral proportional-integral-derivative (PID) controller and self-tuning PID controller were studied in the platform to improve the dynamic characteristics of the HVAC system. System performance was improved. The system saved 18.2% of energy with the integration of iterative learning control and sequential quadratic programming based on the steady-state hierarchical optimization control scheme.
- New
- Research Article
- 10.3390/en18225872
- Nov 7, 2025
- Energies
- New
- Research Article
- 10.3390/en18215847
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215846
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215849
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215848
- Nov 6, 2025
- Energies
- New
- Research Article
- 10.3390/en18215837
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215843
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215832
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215840
- Nov 5, 2025
- Energies
- New
- Research Article
- 10.3390/en18215844
- Nov 5, 2025
- Energies
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.