Q-Learning-Based Priority Dispatching Rule Preference Model for Non-Permutation Flow Shop

Peng Liu,Anran Zhao

doi:10.1142/s0219686724500252

Abstract

Non-permutation flow shop scheduling (NPFS) is an extension of the traditional permutation flow shop scheduling, with a broader solution space. The effectiveness of reinforcement learning in solving flow shop scheduling problems has been demonstrated through its powerful combinatorial optimization capabilities. However, the design and training of the end-to-end policy network is complex, leading to long online training time and limited adaptability of offline training. To overcome these problems, we introduced a NPFS dynamic decision-making process and then proposed a novel NPFS method that combines the Q-learning algorithm with the priority dispatching rule (PDR) set. The NPFS dynamic decision-making process involves decomposing the entire process into multiple sub-job queues for scheduling. The PDR demonstrates better scheduling performance when applied to smaller job queues. By utilizing the Q-learning algorithm, PDRs with superior performance are assigned to sub-scheduling queues based on the generation process of sub-job sequences, resulting in an optimized NPFS strategy. The limited number of PDRs in the PDR set and the small number of sub-job queues in the NPFS process contribute to the efficiency of Q-learning in solving NPFS problem. Finally, we demonstrate the superiority of the proposed NPFS method by a series of numerical experiments using the machine-based assignment PDR method and NSGA-II algorithms as performance benchmarks.

Full Text