In real-life production systems, arrivals of jobs are usually unpredictable, which makes it necessary to develop solid reactive scheduling policies to meet delivery requirements. Deep reinforcement learning (DRL) based scheduling methods are capable of quickly responding to dynamic events by learning from the training data. However, most of policy networks in DRL algorithms are trained to choose priority dispatching rules (PDR), thus, to some extent, the efficiency of obtained scheduling plans is limited by the performance of PDRs. This paper investigates a dynamic flexible job shop scheduling problem with random job arrivals for the total tardiness minimization. A DRL-based reactive scheduling method, proximal policy optimization with attention-based policy network (PPO-APN), is proposed to make real-time decisions for the dynamic scheduling environment, where the attention-based policy network (APN) is able to directly select pending jobs distinguished from the action space that consists of PDRs. Additionally, a global/local reward function (GLRF) is designed to address the reward sparsity issue during training processes. The proposed PPO-APN is tested on randomly generated instances with different production configurations, and is compared with frequently-used PDRs and DRL-based methods. Numerical experimental results indicate that APN and GLRF components significantly improve the training efficiency, and the PPO-APN shows better overall performance compared with other methods. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This work is motivated by a typical production scenario in discrete manufacturing systems, where orders randomly arrive at the shop floor and require to be scheduled in a short time to ensure the on-time delivery. Previous research work tends to apply DRL algorithms to choose suitable dispatching rules for the ease of implementation. Nevertheless, the jobs that can be selected by dispatching rules are rather limited, thus many possible high-quality scheduling plans are ignored. This work first sorts all the unscheduled jobs by a heuristic algorithm, and puts some of top-ranked jobs to a pool. When a machine becomes available, it will directly choose a job from the pool as the next processing task. The job selection policy is represented by a novel attention-based network, and is trained by a powerful DRL algorithm. The aforementioned process is repeatedly executed in a simulation environment to collect the training data. Therefore, after being trained for a certain period of time, the policy will become smarter and can be applied to make right decisions in real-time. The proposed reactive scheduling method has been proved to be more efficient than dispatching rules and DRL-based approaches, and is effective in the production scheduling for a wide variety of discrete manufacturing scenarios, such as automobile and electronics industries. Moreover, the proposed method can be further extended to address dynamic scheduling problems with some production characteristics via adding constraints for the job selection or re-defining calculations for the completion time of operations accordingly.
Read full abstract