The Block Relocation Problem (BRP), also known as the Container Relocation Problem, is a challenging combinatorial optimization problem in block stacking systems and has many applications in real-world scenarios such as logistics and manufacturing industry. The BRP is about finding the optimal way to retrieve blocks from a storage area with the objective of minimizing the number of relocations. The BRPs have been studied for a long time, and have been solved primarily using conventional optimization techniques, including mathematical programming models, as well as both exact and heuristic algorithms. For the first time, this paper tackles the problem using a reinforcement learning method. We focus on one of the major variants of the BRP—the restricted BRP with duplicate priorities (RBRP-dup). We first model the RBRP-dup as a Markov decision process and then propose a Q-learning-based algorithm to solve the problem. The Q-learning-based algorithm contains two phases. In the learning phase, two innovative mechanisms: an optimal rule-integrated behaviour policy and a heuristic-based dynamic initialization method, are incorporated into the Q-learning model to reduce the size of the state-action space and accelerate convergence. In the optimization phase, the insights obtained in the learning phase are combined with a heuristic algorithm to improve decision-making. The performance of our proposed method is evaluated against the state-of-the-art exact algorithm and a commonly used heuristic algorithm based on benchmark instances from the literature. The computational experiments demonstrate the superiority of our proposed method regarding solution quality in large and complex instances.
Read full abstract