The unprecedented prosperity of the Industrial Internet of Things (IIoT) promotes the traditional industry transforming into intelligent manufacturing so that the whole production process can be comprehensively controlled to achieve flexible production. Intelligent scheduling, as one of the key enabling techniques, is desired to allocate the production of several machines by an efficient solution with minimum makespan. Existing approaches adopt a fixed search paradigm based on expert knowledge to seek satisfactory solutions. However, considering the varying data distribution and large-sized of the practical problems, these methods fail to guarantee the quality of the obtained solution under the real-time requirement. To address this challenge, we formulate the production scheduling problem as a Markov decision process (MDP) and specifically design a job scheduling model made up of a job batching module for the hybrid flow-shop scheduling problem on batch processing machines (HFSP-BPM). Our proposed model consists of an actor network that learns the action under different conditions and a critic network that evaluates the action of the actor. We analyze the convergence of the model under different parameter settings to determine the optimal parameter. Extensive numerical experiments on both publicly available dataset and real steel plant production dataset demonstrate that the proposed deep reinforcement learning (DRL) approach compared with other baselines, more than 6% average improvements can be observed in many instances.
Read full abstract