Abstract

This study presents a novel guiding framework embedded with robust optimization (RO) for the training phase of reinforcement learning (RL), specifically tailored for dynamic scheduling of a single-stage multi-product chemical reactor in an uncertain environment. The proposed framework addresses the challenge of local optima in policy gradient methods by integrating optimization methods, enhancing both the objective value and learning stability of the RL. We further enhance the robustness of the proposed model against parameter distortions by incorporating RO as a guiding engine. A numerical study of chemical material production scheduling is conducted to validate the proposed model and the results demonstrate the effectiveness to address demand volatility with several metrics including sensitivity analysis, solution quality analysis, computational time, compared to a typical actor-critic RL.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call