Due to the availability of flexibility, Decentralized Energy Systems (DES) play a central role in integrating renewable energies. To efficiently utilize renewable energy, dispatchable components must be operated to bridge the time gap between inflexible supply and energy demand. Due to the large number of energy converters, energy storage systems, and flexible consumers, there are many ways to achieve this. Conventional rule-based dispatch strategies often reach their limits here, and optimized dispatch strategies (e.g., model predictive control or the optimal dispatch) are usually based on very good forecasts. Reinforcement learning, particularly the application of Artificial Neural Networks (ANN), offers the possibility to learn complex decision-making processes. Since long training times are required for this, an efficient training framework is needed. The present paper proposes different training methods to learn an ANN-based dispatch strategy. In Method I, the ANN attempts to learn the solution to the corresponding optimal dispatch problem. In method II, the energy system model is simulated during the training to compute the observation state and operating costs resulting from the ANN-based dispatch. Method III uses the fast executable Method I solution as a warm-up solution for the computationally expensive training with Method II. In the present paper, a model-based analysis compares the different ANN-based dispatch strategies with rule-based dispatch strategies, model predictive dispatch (MPC), and optimal dispatch regarding their computational efficiency and the resulting operating costs. The dispatch strategies are compared based on three case studies with different system topologies for which training and test data are applied.Training method I proved to be non-competitive. However, training methods II and III significantly outperformed rule-based dispatch strategies across all case studies for both training and test data sets. Notably, methods II and III also surpassed MPC-based dispatch strategies under high and medium uncertainty forecasts for the training data in the first two case studies. In contrast, MPC-based dispatch was superior in the third case study, likely due to the higher system’s complexity and in the test data set due to the methodological advantage of being optimized for each specific data set. The effectiveness of training method III depends on the performance of the warm-up training with method I: warm-up is beneficial only if this results in an already promising dispatch (as seen in case study two). Otherwise, training method II proves more effective, as observed in case studies one and three.
Read full abstract