Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach

Zhenyu Wang,Bin Cai,Jun Li,Deheng Yang,Yang Zhao,Huan Xie

doi:10.1016/j.cor.2022.106095

Zhenyu Wang, Bin Cai + Show 4 more

https://doi.org/10.1016/j.cor.2022.106095

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

The non-permutation flow-shop scheduling problem (NPFS) is studied. We model it as a Markov decision process, creating a massive arena for reinforcement learning (RL) algorithms to work. While RL approaches with function approximation generate a significant number of sequences of highly linked states, few studies have examined the connection between the state sequences but merely shuffled their orders. To this end, this paper proposes a novel deep reinforcement learning algorithm, named LSTM-TD(0), to address NPFS. Specifically, we design fifteen state features to represent a production state at each scheduling point and fourteen actions to choose an unprocessed operation on a given machine. This study applies long short-term memory (LSTM) network to capture the intrinsic connection of the state sequences in RL-based scheduling approaches. Moreover, we enhance the LSTM model with the one-step temporal difference (TD(0)) algorithm to select each action impartially in relation to the state value, avoiding the frequent overestimation of action values in Q-learning. The proposed LSTM-TD(0) was trained using two LSTM networks and enhanced by redesigning the reward value. A series of comparative experiments were conducted between simple heuristic rules, metaheuristic rules, general DRL methods, and LSTM-TD(0) using a group of well-known benchmark problems with different scales. Comparative results have confirmed both the superiority and universality of LSTM-TD(0) over its competitors. Scalability tests reveal that our approach can generalize to instances of different sizes without retraining or knowledge transferring.

Full Text