Operation strategy optimization of combined cooling, heating, and power systems with energy storage and renewable energy based on deep reinforcement learning

Yingjun Ruan,Zhenyu Liang,Fanyue Qian,Hua Meng,Yuan Gao

doi:10.1016/j.jobe.2022.105682

Abstract

Combined cooling, heating, and power (CCHP), coupled with renewable energy generation and energy storage can achieve a low-carbon, multi-energy complementary, and flexible energy system. However, the inclusion of renewable resources and energy storage poses significant challenges to the operational management of such systems. Conventional algorithms are limited when solving nonlinear and uncertain non-convex optimization problems thus, deep reinforcement learning (DRL) is considered the most effective method to solve these issues because of its powerful nonlinear fitting ability, model-free utilization, and capability of solving decision-making problems. This study proposes a novel operation strategy optimization model based on DRL to minimize the operation cost of an energy system, which composes of CCHP, photovoltaic generation and energy storage system. The optimization problem was transformed into a Markov decision process (MDP), and the state space and action space are differently modeling in summer and winter scenarios. Two models of handling action constraints are proposed and testing in terms of performance. Two DRL algorithms, Deep Deterministic policy gradient (DDPG) and Twin Delayed Deep Deterministic policy gradient (TD3), were verified separately via comparison with the conventional algorithms, particle swarm optimization (PSO) and mathematical programming under perfect input conditions. Results show that setting different training environments for summer and winter respectively can get better optimization results, and the model with better performance in handling action constraints was validated from the operation strategy and optimization results. The performance of the TD3 method is comparable to the theoretical benchmark, with an average error of approximately 5%. The computation time for a single-step online decision is only 0.001s and for a 24-step online decision is only 0.006s, it significantly improves operational efficiency, demonstrating the adaptability of DRL methods for optimization and computational performance.

Full Text