Abstract

As an efficient way to integrate multiple distributed energy resources (DERs) and the user side, a microgrid is mainly faced with the problems of small-scale volatility, uncertainty, intermittency and demand-side uncertainty of DERs. The traditional microgrid has a single form and cannot meet the flexible energy dispatch between the complex demand side and the microgrid. In response to this problem, the overall environment of wind power, thermostatically controlled loads (TCLs), energy storage systems (ESSs), price-responsive loads and the main grid is proposed. Secondly, the centralized control of the microgrid operation is convenient for the control of the reactive power and voltage of the distributed power supply and the adjustment of the grid frequency. However, there is a problem in that the flexible loads aggregate and generate peaks during the electricity price valley. The existing research takes into account the power constraints of the microgrid and fails to ensure a sufficient supply of electric energy for a single flexible load. This paper considers the response priority of each unit component of TCLs and ESSs on the basis of the overall environment operation of the microgrid so as to ensure the power supply of the flexible load of the microgrid and save the power input cost to the greatest extent. Finally, the simulation optimization of the environment can be expressed as a Markov decision process (MDP) process. It combines two stages of offline and online operations in the training process. The addition of multiple threads with the lack of historical data learning leads to low learning efficiency. The asynchronous advantage actor–critic (Memory A3C, M-A3C) with the experience replay pool memory library is added to solve the data correlation and nonstatic distribution problems during training. The multithreaded working feature of M-A3C can efficiently learn the resource priority allocation on the demand side of the microgrid and improve the flexible scheduling of the demand side of the microgrid, which greatly reduces the input cost. Comparison of the researched cost optimization results with the results obtained with the proximal policy optimization (PPO) algorithm reveals that the proposed algorithm has better performance in terms of convergence and optimization economics.

Highlights

  • With the development of power systems for a variety of distributed energy sources, the traditional energy management system (EMS) is expected to develop into a new form called an integrated energy management system (IEMS)

  • E is the power generation cost of distributed energy resources (DERs); CESS is the depreciation cost of charging grid at time t; when δt δt > 0, it means buying electricity from the main grid, otherwise it and discharging of the battery in the energy storage system; T is the operating time period means selling electricity to the main grid; cb,t is the price of purchasing electricity from the exchanged between the microgrid and the within a day; δ t is the amount of electricity main grid at time t; and cs,t means buying electricity at the main grid price of electricity

  • In order to verify the effectiveness of the improved M-A3C algorithm based on Algorithm 1 proposed in this paper and the energy scheduling of the proposed model, Appendix A lists the parameters of each component of the microgrid

Read more

Summary

Introduction

With the development of power systems for a variety of distributed energy sources, the traditional energy management system (EMS) is expected to develop into a new form called an integrated energy management system (IEMS). Reference [7] simulates the microgrid environment of battery energy storage combined with hydrogen storage devices and uses the deep Q-network (DQN) reinforcement learning method to complete energy scheduling optimization. Reference [16] proposed a method of optimizing comprehensive energy economic dispatch by using the deep deterministic policy gradient (DDPG) algorithm for renewable energy considering the time-varying characteristics of the load This method addresses environmental policy learning on continuous state actions. From the perspective of microgrid optimization algorithm, combined with the existing research, the experience playback pool M-A3C is introduced on the basis of the A3C algorithm This algorithm solves the effective management and training of high dimensions and the speed of convergence and achieves higher model performance and convergence of optimal policies. The M-A3C algorithm is compared with other algorithms (DQN, PPO, double DQN), and it is verified that the strategy optimization ability of the algorithm is better than that of the other algorithms in the same environment

Microgrid Structure and Equipment Model
Objective Function
Interaction Power Constraints between Microgrid and Main Grid
Model of Each Component of the Microgrid
Interaction Mechanism between Main Network and Microgrid
Microgrid Management Reinforcement Learning Scheme
State Space
Action Space
Reward Function
M-A3C Network Structure
Basic Data
Analysis of Results
Cumulative
Figures and
Comparison of Algorithms
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call