Abstract

Inventory cost is a significant factor in Supply Chain Management (SCM), and an effective replenishment strategy can reduce warehouse operation costs. However, traditional replenishment strategies often struggle to meet the complex and ever-changing demands of real-world warehouse scenarios. Moreover, the spatiotemporal heterogeneity of commodity demand and inventory cost poses significant challenges to time series prediction models, as individual training strategies for different commodities significantly increase modeling and time costs. To address these issues, we propose a replenishment model called IACPPO, which incorporates the Advantage Actor-Critic (A2C) algorithm with the Proximal Policy Optimization (PPO) algorithm. Firstly, we introduce gated recurrent unit (GRU) and Attention Mechanisms into the Actor-Critic network to analyze data state spaces for probabilistic modeling and extracting valid information from environmental state sequences by memory reasoning and focusing on critical state sequences; additionally, we fuse the A2C algorithm with the PPO algorithm to train the whole network simultaneously to obtain the replenishment strategy. Finally, experimental results on two different real-world inventory datasets show that using the IACPPO model has achieved the best cost control strategies in most experimental validations of replenishment strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call