Abstract

This paper solves the flow-shop scheduling problem (FSP) through the reinforcement learning (RL), which approximates the value function with neural network (NN). Under the RL framework, the state, strategy, action, reward signal, and value function of FSP were described in details. Considering the intrinsic features of FSP, various information of FSP was mapped into RL states, including the maximum, minimum, and mean of makespan, the maximum, minimum, and mean of remaining operations, as well as the load of machines. Besides, the optimal scheduling rules corresponding to specific states were mapped into the actions of RL. On this basis, the NN was trained to establish the mapping between states and actions, and select the action of the highest probability under a specific state. In addition, a reward function was constructed based on the idle time (IT) of machines, and the value function was generalized by the NN. Finally, our algorithm was tested on 23 benchmark examples and more than 7 sets of example machines. Small relative errors were achieved on 20 of the 23 benchmark examples and satisfactory results were realized on all 7 machine sets. The results confirm the superiority and universality of our algorithm, and indicate that FSP can be solved effectively by completely mapping it into our RL framework. The research results provide a reference for solving similar problems with RL algorithm based on value function approximation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call