Simultaneous allocation and sequencing of orders for robotic mobile fulfillment system using reinforcement learning algorithm

Saravana Perumaal Subramanian,Selva Kumar Chandrasekar

doi:10.1016/j.eswa.2023.122262

Abstract

Robotic Mobile Fulfillment Systems (RMFS) can benefit large e-commerce warehouse operations significantly. To fulfill the orders received, RMFS deploys mobile robots to carry shelves back and forth from the storage area to the picking station. Order allocation and sequencing for mobile robots is a complex yet critical task as it influences the distance traveled by mobile robots in fulfilling the orders, i.e., appropriately allocating and sequencing the orders. In this paper, a Simultaneous Allocation and Sequencing of Orders Reinforcement Learning (SASORL) algorithm is proposed to minimize the distance traveled by mobile robots. Unlike existing methods, the SASORL algorithm optimizes order allocation and sequencing concurrently, significantly reducing mobile robot travel distance. The proposed SASORL algorithm encompasses three sets, namely state, action, and reward/penalty. The state set comprises the orders fulfilled, whereas the action set contains the orders yet to be fulfilled. The distance traveled by the mobile robot as a result of the orders allocated and sequenced is taken as the penalty for the proposed SASORL algorithm. As the proposed SASORL algorithm simultaneously allocates and sequences the orders to the mobile robots, the action set depletes, the state set enlarges, and the penalty updates until the action set becomes null. Each episode restarts with the learned experience of the prior episodes, and after completing a few episodes, the SASORL algorithm is capable of generating an optimal order allocation and sequence that commits the minimum travel distance to the mobile robots. SASORL algorithm is superior to widely adopted soft computing techniques when orders are randomly distributed. This superiority is evidenced by a 26% reduction in the maximum distance traveled by all mobile robots, a 54% reduction in the standard deviation of the distance traveled by the mobile robots, and a marginal increase of 7% in the total distance traveled by all mobile robots. Moreover, the SASORL algorithm outperforms soft computing techniques by 44% in terms of computation time.

Full Text