Abstract
Multi-echelon inventory systems are commonly used in practice to satisfy widely distributed random demands of spare parts in an efficient and cost-effective manner. Optimization of a multi-echelon inventory system is a decision-making problem under uncertainties. Classic inventory policies (e.g. (s, S) and (R, Q)) that do not consider the inventory positions of other warehouses become suboptimal due to interrelationships among different warehouses caused by transshipment. The Markov decision process (MDP) is an effective tool for inventory optimization, which does not require a predetermined parameterized policy structure. Unfortunately, both the state and action spaces of MDP suffer from the curse of dimensionality when the number of warehouses increases. This paper optimizes the inventory of a large-scale multi-echelon inventory system using a new multi-agent deep reinforcement learning (MADRL) algorithm named EM-VDTD3 that is developed by introducing value decomposition and experience buffer modification into the twin delayed deep deterministic policy gradient (TD3) algorithm. Each agent in EM-VDTD3 manages a subsystem in the multi-echelon inventory system. Because different agents share the same network parameters, networks are customized to process subsystems with different parameters. Domain knowledge of inventory control is embedded in the learning process of EM-VDTD3 by adding expert experiences to the experience buffer. An efficient approximate method is developed to identify a teacher policy that generates expert experiences. Numerical studies about a spare part inventory system in the wind energy industry show that the proposed EM-VDTD3 outperforms benchmark methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have