Abstract
Deep reinforcement learning (DRL), as a promising technique, is a new approach to solve the job shop scheduling problem (JSSP). Although DRL method is effective for solving JSSP, there are still deficiencies in state representation, action space definition, and reward function design, which make it difficult for the agent to learn effective policy. In this paper, we model JSSP as a Markov decision process (MDP) and design a new state representation using the state features of bidirectional scheduling, which can not only enable the agent to capture more effective state information, improve its decision-making ability, but also effectively avoid the phenomenon of multiple optimal action selections in candidate action set. Invalid action masking (IAM) technique is employed to narrow the search space, which helps the agent avoid exploring suboptimal solutions. We evaluate the performance of the policy model on eight public test datasets: ABZ, FT, ORB, YN, SWV, LA, TA, and DMU. Extensive experimental results show that the proposed method on the whole has better optimization ability than the existing state-of-the-art models and priority dispatching rules.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.