Abstract

Resource planning and management are essential strategic practices in the supply chain. Resource allocation problems are becoming more complex due to the dynamic nature of these logistical systems. Since the increasing popularity of Deep Reinforcement Learning (DRL) algorithms in Gaming and Robotics, scholars have started investigating their potential for addressing supply chain concerns. The utilization of DRL-based approaches for addressing supply chain optimization problems remains largely unexplored. Therefore, we present a systematic literature analysis to investigate the adoption of DRL solution techniques for solving supply chain optimization problems. Afterward, we propose a novel method to address allocation problems in the supply chain founded on DRL-based algorithms, namely Asynchronous Actor critic (A3C) and Proximal Policy Optimization (PPO). A simulation model is developed to train and evaluate the application of these algorithms. We transform numerical data from the simulation into Gantt charts and pass them as observations. This formulation of the observation space is motivated by the fact that Deep Neural Networks (DNNs) are well-suited for image-based analysis. The computational results demonstrate that both algorithms successfully learn allocation policies using image-based observation considering multiple objective values. The results of the conducted experiments indicate that A3C achieves a more stable allocation policy than PPO in minimizing all considered objective values.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call