Robotic Mobile Fulfillment Systems (RMFS) are extensively employed in modern warehouses. In the era of booming e-commerce, this parts-to-picker model significantly reduces warehouse costs and enhances operational efficiency. The primary focus of this article is on the scheduling of batch orders, which involves the simultaneous allocation of mobile robots and picking stations to orders to meet their requirements, with the aim of minimizing processing costs. While the optimization of RMFS has been extensively investigated by scholars in recent years., research on categorized storage remains limited. Furthermore, combinatorial optimization in scenarios with large orders, numerous picking stations, and multiple mobile robots presents an ongoing and significant challenge. To overcome these challenges, we introduce the Enhanced Deep Reinforcement Learning Method for Batch Order Scheduling (EDRL-OBOS), designed to minimize operational costs in RMFS. Our proposed algorithm, EDRL-OBOS, features a uniquely designed state space that encompasses the allocation of all orders, with the variable portion of the state space diminishing as the algorithm evolves. Moreover, we substitute traditional actions with heuristic rules, significantly enhancing the efficiency of the algorithm. Additionally, we establish a reward function based on a greedy algorithm, ensuring that our method surpasses conventional algorithms. As the number of episodes progresses, the EDRL-OBOS algorithm assigns picking stations and sequences of mobile robots to each order until all orders within the state space are fulfilled, subsequently determining the reward. Subsequently, all states are reset to their initial conditions, and the order allocation scheme is optimized through the learning experience from the previous episode, until convergence is reached. Ultimately, we conduct simulation experiments by varying the number of orders, picking stations, and mobile robots. The results indicate that the EDRL-OBOS algorithm achieves a maximum cost reduction of 22.89%, a minimum of 19.66%, and an average of 21.57%, across various scenarios. At the same time, the computation of our algorithms is superior to that of traditional algorithms.
Read full abstract