Abstract

Reinforcement Learning (RL) provides a pathway for efficiently utilizing the battery storage in a microgrid. However, traditional value-based RL algorithms used in battery management focus on formulating the policies based on the reward expectation rather than its probability distribution. Hence the scheduling strategy is solely based on the expectation of the rewards rather than the distribution. This paper focuses on scheduling strategy based on probability distribution of the rewards which optimally reflects the uncertainties in the incoming dataset. Furthermore, the prioritized experience replay samples of the training experience are used to enhance the quality of the learning by reducing bias. The results are obtained with different variants of distributional RL algorithms like C51, Quantile Regression Deep Q-Network (QR-DQN), Fully Quantizable Function (FQF), Implicit Quantile Networks (IQN) and rainbow. Moreover, the results are compared with the traditional deep Q-learning algorithm with prioritized experienced replay. The convergence results on the training dataset are further analyzed by varying the action spaces, using randomized experience replay and without including the tariff-based action while enforcing the penalties for violating battery SoC limits. The best trained Q-network is tested with different load and PV profiles to obtain the battery operation and costs. The performance of the distributional RL algorithms is analyzed under different schemes of Time of Use (ToU) tariff. QR-DQN with prioritized experience replay has been found to be the best performing algorithm in terms of convergence on the training dataset, with least fluctuation in validation dataset and battery operations during different tariff regimes during the day.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call