Abstract

Reinforcement learning (RL) is a promising solution for difficult decision-making problems, such as inventory management in chemical supply chains. However, enabling RL to explicitly consider known environment constraints is crucial for safe deployment in practical applications. This work incorporates recent tools for optimization over trained neural networks to introduce two algorithms for safe training and deployment of RL, with a focus on supply chains. Specifically, we use optimization over trained neural-network state–action value functions (i.e., a critic function) to directly incorporate constraints when computing actions in a continuous action space. Furthermore, we introduce a second algorithm that guarantees constraint satisfaction during deployment by directly implementing actions from constrained optimization of a trained value function. The algorithms are compared against state-of-the-art algorithms TRPO, CPO, and RCPO using a computational supply chain case study.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call