We develop a Reinforcement Learning (RL) approach to the supervisory control problem for advanced energy systems, such as novel nuclear reactors and other demand-driven, mission-critical, and component-health-sensitive energy plants. The inclusive problem landscape considered captures the stochastic confluence of plant performance, component health evolution, power demand from the grid, diverse maintenance actions, and operator-defined goals and constraints, all considered over meaningfully long-enough reasoning horizons. Key aspects of the proposed approach are a receding horizon control-inspired technique dictating time- or event-triggered supervisory policy (re-)constructions, as well as additional capability-enabling contributions such as timescale compression, to handle long reasoning horizons and uncertainty in parts of the problem, and practical yet demonstrably-effective handling of hybrid action spaces with continuous and discrete decision variables. The resulting algorithm consists of a simulation-based RL agent constructing stochastic supervisory control policies over nontrivial action spaces and for long horizons, applying the learned policy to the system for a much shorter interval, and perpetually repeating, to construct the next long-horizon policy. That next policy will only be applied, again, for a short interval, yet originally far-in-time events move progressively closer, their associated uncertainty decreases, and new events and aspects enter the reasoning horizon. The proposed methodology bridges fundamental receding horizon concepts with the unequivocally stronger and more scalable reasoning of contemporary RL. Numerical examples using Soft Actor–Critic Deep RL illustrate the operation and efficacy of the proposed technique for a power plant tasked with health-aware load following missions in a dynamic electricity market landscape.
Read full abstract