Abstract

This paper presents our approach for the control of a centralized distributed inventory management system using reinforcement learning (RL). We propose the application of policy-based reinforcement learning algorithms to tackle this problem in an effective manner. We have formulated the problem as a Markov decision process (MDP) and have created an environment that keeps track of multiple products across multiple warehouses returning a reward signal that directly corresponds to the total revenue across all warehouses at every time step. In this environment, we have applied various policy-based reinforcement learning algorithms such as Advantage Actor-Critic, Trust Region Policy Optimization and Proximal Policy Optimization to decide the amount of each product to be stocked in every warehouse. The performance of these algorithms in maximizing average revenue over time has been evaluated considering various statistical distributions from which we sample demand per time step per episode of training. We also compare these approaches to an existing approach involving a fixed replenishment scheme. In conclusion, we elaborate upon the results of our evaluation and the scope for future work on the topic.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.