Abstract

This study presents a novel guiding framework embedded with robust optimization (RO) for the training phase of reinforcement learning (RL), specifically tailored for dynamic scheduling of a single-stage multi-product chemical reactor in an uncertain environment. The proposed framework addresses the challenge of local optima in policy gradient methods by integrating optimization methods, enhancing both the objective value and learning stability of the RL. We further enhance the robustness of the proposed model against parameter distortions by incorporating RO as a guiding engine. A numerical study of chemical material production scheduling is conducted to validate the proposed model and the results demonstrate the effectiveness to address demand volatility with several metrics including sensitivity analysis, solution quality analysis, computational time, compared to a typical actor-critic RL.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.