An extended flexible job scheduling problem is presented with characteristics of technology and path flexibility (dual flexibility), varied transportation time, and an uncertain environment. The scheduling can greatly increase efficiency and security in complex scenarios, e.g., distributed vehicle manufacturing, and multiple aircraft maintenance. However, optimizing the scheduling puts forward higher requirements on accuracy, real time, and generalization, while subject to the curse of dimension and usually incomplete information. Various coupling relations among operations, stations, and resources aggravate the problem. To deal with the above challenges, we propose a multi-agent reinforcement learning algorithm where the scheduling environment is modeled as a decentralized partially observable Markov decision process. Each job is regarded as an agent that decides the next triplet, i.e., operation, station, and employed resource. This paper is novel in addressing the flexible job shop scheduling problem with dual flexibility and varied transportation time in consideration and proposing a double Q-value mixing (DQMIX) optimization algorithm under a multi-agent reinforcement learning framework. The experiments of our case study show that the DQMIX algorithm outperforms existing multi-agent reinforcement learning algorithms in terms of solution accuracy, stability, and generalization. In addition, it achieves better solution quality for larger-scale cases than traditional intelligent optimization algorithms.
Read full abstract