Abstract

Deep reinforcement learning (DRL), as a promising technique, is a new approach to solve the job shop scheduling problem (JSSP). Although DRL method is effective for solving JSSP, there are still deficiencies in state representation, action space definition, and reward function design, which make it difficult for the agent to learn effective policy. In this paper, we model JSSP as a Markov decision process (MDP) and design a new state representation using the state features of bidirectional scheduling, which can not only enable the agent to capture more effective state information, improve its decision-making ability, but also effectively avoid the phenomenon of multiple optimal action selections in candidate action set. Invalid action masking (IAM) technique is employed to narrow the search space, which helps the agent avoid exploring suboptimal solutions. We evaluate the performance of the policy model on eight public test datasets: ABZ, FT, ORB, YN, SWV, LA, TA, and DMU. Extensive experimental results show that the proposed method on the whole has better optimization ability than the existing state-of-the-art models and priority dispatching rules.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call