Battery energy storage control using a reinforcement learning approach with cyclic time-dependent Markov process

Sara Abedi,Sang Won Yoon,Soongeol Kwon

doi:10.1016/j.ijepes.2021.107368

Sara Abedi, Sang Won Yoon + Show 1 more

https://doi.org/10.1016/j.ijepes.2021.107368

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

Scheduling efficient energy management system operations to respond to the unstable customer demand, electricity prices, and weather increases the complexity of the control systems and requires a flexible and cost-effective control policy. This study develops an intelligent and real-time battery energy storage control based on a reinforcement learning model focused on residential houses connected to the grid and equipped with solar photovoltaic panels and a battery energy storage system. Because the reinforcement learning’s performance is very dependent on the design of the underlying Markov decision process, a cyclic time-dependent Markov Process is uniquely designed to capture existing daily cyclic patterns in demand, electricity price, and solar energy. The Markov Process is successfully used in the Q-learning algorithm, resulting in more efficient battery energy control and saving electricity costs. The proposed Q-learning algorithm is compared with benchmark models of a deterministic equivalent solution and a One-step Roll-out algorithm. Numerical experiments show the gap between the deterministic equivalent solution and Q-learning approaches for one-month electricity cost decreased from 7.99% to 3.63% for house 27 and 6.91% to 3.26% for house 387 when the discrete size of demand, solar energy, price, and battery energy level adjusted to 20. Accordingly, the better performance of the proposed Q-learning is demonstrated compared to the One-step Roll-out algorithm. Moreover, the effect of discrete size of state-space parameters on the adaptive Q-learning performance and computational time are investigated. Variations in the electricity price significantly affect the Q-learning algorithm’s performance more than other parameters.

Full Text