Designing an adaptive production control system using reinforcement learning

Andreas Kuhnle,Jan-Philipp Kaiser,Nicole Stricker,Felix Theiß,Gisela Lanza

doi:10.1007/s10845-020-01612-y

Andreas Kuhnle, Jan-Philipp Kaiser + Show 3 more

Open Access

https://doi.org/10.1007/s10845-020-01612-y

Copy DOI

Abstract

Modern production systems face enormous challenges due to rising customer requirements resulting in complex production systems. The operational efficiency in the competitive industry is ensured by an adequate production control system that manages all operations in order to optimize key performance indicators. Currently, control systems are mostly based on static and model-based heuristics, requiring significant human domain knowledge and, hence, do not match the dynamic environment of manufacturing companies. Data-driven reinforcement learning (RL) showed compelling results in applications such as board and computer games as well as first production applications. This paper addresses the design of RL to create an adaptive production control system by the real-world example of order dispatching in a complex job shop. As RL algorithms are “black box” approaches, they inherently prohibit a comprehensive understanding. Furthermore, the experience with advanced RL algorithms is still limited to single successful applications, which limits the transferability of results. In this paper, we examine the performance of the state, action, and reward function RL design. When analyzing the results, we identify robust RL designs. This makes RL an advantageous control system for highly dynamic and complex production systems, mainly when domain knowledge is limited.

Highlights

Manufacturing companies face ever-increasing complexity in their internal operational processes driven by versatile and fast changing market environments (Abele and Reinhart 2011)
While machines select the nition is used in this paper, based on the available resources in order to be processed according to a First In First Out (FIFO)-rule, i.e., the the production environment: An action consists of the source order waiting longest in the entry buffer, an reinforcement learning (RL)-agent is used and destination resource of the transport to perform by the to decide which order to dispatch
Thereby, existing challenges in the application of reinforcement learning methods are identified and addressed: First, designing the state information passed to the agent and based on which the agent makes the decision has an influence on the learned performance

Summary

Introduction

Manufacturing companies face ever-increasing complexity in their internal operational processes driven by versatile and fast changing market environments (Abele and Reinhart 2011). The approaches are compared based on the following requirements: job shop manufacturing system as considered use case with its degrees of freedom, highly volatile environment conditions in terms of high volume and many product variants, complex internal material flow due to the job shop setup, stochastics such as machine breakdowns are considered, order dispatching with respect to transport or processing, adaptive production control system, near real-time decision-making, multi-objective target system for optimization, dynamic evaluation of performance, order- or resource-related performance indicators, RL as solution algorithm, discrete-event simulation for validation, and recommendations of the RL-design. While machines select the nition is used in this paper, based on the available resources in order to be processed according to a FIFO-rule, i.e., the the production environment: An action consists of the source order waiting longest in the entry buffer, an RL-agent is used and destination resource of the transport to perform by the to decide which order to dispatch next.

Evaluation and computational results

Evaluation setup

Evaluation results