Abstract

The impact of the covid pandemic and the chip crisis on industrial production are just a few examples that emphasize the complex and volatile environment production systems must cope with. Especially operational scheduling tasks intensely suffer from these influences due to increased complexity in decision-making as well as frequent rescheduling activities. At the same time, they highly affect production-related key performance indicators such as lead time or resource utilization. As several publications have already shown, applying innovative and data-based methods from the field of Reinforcement Learning (RL) to complex scheduling tasks provides great potential to handle the challenges arising from a complex and volatile environment. The building blocks of an RL approach strongly depend on a company's individual task specifications and optimization goals. Yet, a methodology that considers these specifications for the design of RL approaches in production scheduling has not been introduced, preventing the transfer from laboratory examples to wide-ranging industry applications. To address this research gap, this paper aims to provide a conceptual methodology to generate RL solutions depending on the scheduling tasks and objectives. The methodology proposed in this paper consists of four central modules that constitute the building blocks of an RL solution. The first module derives the action space for the RL approach based on the underlying scheduling tasks. The second module constructs the reward function of the RL approach based on a company's individual scheduling targets. The third module derives the state vector from the components of the reward function. The last module selects an appropriate optimization algorithm for the RL approach and merges the previous modules to learn an optimal scheduling policy in a simulation environment that can be applied to real-world problems. As a result, the application of RL-based scheduling enables production systems to meet current requirements and evolve into resilient and self-optimizing systems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call