This paper introduces a novel approach to address the issue of overestimation of state-action values in reinforcement learning algorithms based on the Q-framework. It integrates a dual-layer Q-learning algorithm with a grey wolf intelligent optimization method, enabling rapid search for optimal allocations in unknown search spaces. This integration results in the development of a multi-agent collaborative Area-Grid Coordination (A-GC) strategy, termed the grey wolf double Q (GWDQ) strategy, tailored for multi-area energy interconnection scenarios. The proposed GWDQ strategy is evaluated through simulation experiments on comprehensive energy system models, including mixed gas turbine systems, Combined Cooling, Heating, and Power (CCHP) systems, and a multi-area energy-interconnected Northeast power grid model. A centralized architecture is established to analyze the optimization effects of digital advertising. The performance of the GWDQ strategy is compared with traditional reinforcement learning algorithms through simulation and empirical data validation. Results indicate that the GWDQ strategy exhibits stronger learning capabilities, improved stability, and enhanced control performance compared to traditional methods. It demonstrates superior optimization for digital advertising and enables swift acquisition of optimal coordination in A-GC processes across multiple regions. Additionally, the paper analyzes the environmental and economic impacts of the proposed strategy.
Read full abstract