Abstract

Designing preventive maintenance (PM) policies that ensure smooth and efficient production for large-scale manufacturing systems is non-trivial. Recent model-free reinforcement learning (RL) methods shed lights on how to cope with the non-linearity and stochasticity in such complex systems. However, the action space explosion impedes RL-based PM policies to be generalized to real applications. In order to obtain cost efficient PM policies for a serial production line that has multiple levels of PM actions, a novel multi-agent modeling is adopted to support adaptive learning by modeling each machine as cooperative agent. The evaluation of system-level production loss is leveraged to construct the reward function. An adaptive learning framework based on value-decomposition multi-agent actor–critic algorithm is utilized to obtain PM policies. In simulation study, the proposed framework demonstrates its effectiveness by leading other baselines on a comprehensive set of metrics whereas the centralized RL-based methods struggles to converge to stable policies. Our analysis further demonstrates that our multi-agent reinforcement learning based method learns effective PM policies without any knowledge about the environment and maintenance strategies.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call