Abstract

Dynamic scheduling problems have been receiving increasing attention in recent years due to their practical implications. To realize real-time and the intelligent decision-making of dynamic scheduling, we studied dynamic permutation flowshop scheduling problem (PFSP) with new job arrival using deep reinforcement learning (DRL). A system architecture for solving dynamic PFSP using DRL is proposed, and the mathematical model to minimize total tardiness cost is established. Additionally, the intelligent scheduling system based on DRL is modeled, with state features, actions, and reward designed. Moreover, the advantage actor-critic (A2C) algorithm is adapted to train the scheduling agent. The learning curve indicates that the scheduling agent learned to generate better solutions efficiently during training. Extensive experiments are carried out to compare the A2C-based scheduling agent with every single action, other DRL algorithms, and meta-heuristics. The results show the well performance of the A2C-based scheduling agent considering solution quality, CPU times, and generalization. Notably, the trained agent generates a scheduling action only in 2.16 ms on average, which is almost instantaneous and can be used for real-time scheduling. Our work can help to build a self-learning, real-time optimizing, and intelligent decision-making scheduling system.

Highlights

  • This paper solved the dynamic permutation flowshop scheduling problem (PFSP) with new job arrival to minimize total tardiness cost using deep reinforcement learning (DRL)

  • This study aims to establish an intelligent decision-making scheduling system to provide real-time optimization for dynamic scheduling problems

  • The DRL-based scheduling system is proposed with state features, actions, and reward designed for the scheduling agent and workshop environment

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Wu et al [54] used deep learning to solve unreliable machines’ dynamic dispatching in re-entrant production systems They combine a deep neural network (DNN) and Markov decision processes (MDP) to assign different priorities to job groups to minimize cycle time or maximize throughput. Li et al [55] studied the flexible job-shop scheduling problem (FJSP) with sequence-dependent setup times and limited dual resources using machine learning and meta-heuristics. The dynamic PFSP with new job arrival and total tardiness cost criteria has not been solved by DRL. This paper studies the dynamic PFSP with new job arrival to minimize total tardiness cost using DRL. To the best of our knowledge, this is the first attempt to solve the dynamic PFSP with new job arrival to minimize total tardiness cost using DRL. Our work shows the DRL-based scheduling method outperforms traditional metaheuristics (IG and GA) in solution quantity and CPU times by a large margin for dynamic FPSP

Problem Description
Mathematical
Modelling of the Intelligent Scheduling System
Reward
State Features
Actions
Numerical Experiments
Training Process of A2C
Comparison with SDR
Comparison
Comparison with DRL and Meta-Heuristics
IG and GA
Average
50 Iterations
Generalization
Figure
As shown in Figure
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call