This study presents a methodology that makes use of Deep Recurrent Q-Learning to develop an agent that acts as an online scheduler for flow-shop or job-shop batch plants with zero-wait restriction under uncertainty. The environment is assumed to be partially observable, i.e., it does not follow the Markov property and information has to be gathered from previous time intervals. The processing times of the machines are unknown to the agent whereas production demand realizations are provided during the operation and not known a priori. The agent aims to complete the demands and to minimize the makespan of the process. Moreover, the agent should avoid violation of constraints associated with product allocation, time horizon, and storage capacity. Three case studies featuring two job-shops and one flow-shop are presented to show the benefits of this framework with environments where the information is limited. Results showed that the agents can generate schedules considering the uncertain parameters of the system while aiming to reduce the makespan of the process. Tests on the agents resulted in small errors in the decision-making process (less than 2%) thus demonstrating that DRQN can serve as a reliable tool for online scheduling subject to uncertainty in partially observable job and flow-shops.
Read full abstract