A heuristically accelerated reinforcement learning method for maintenance policy of an assembly line

Xiao Wang,Na Qu,Guowei Zhang,Yongqiang Li

doi:10.3934/jimo.2022047

Abstract

<p style='text-indent:20px;'>This paper aims to investigate the maintenance policy for a two-machine one-buffer (2M1B) assembly line system. We assume that the observed quality states of the deteriorating machines in the system are characterized by multiple decreasing yield stages. A semi-Markov decision process (SMDP) model is used for describing the deteriorating process of the system. A heuristically accelerated multi-agent reinforcement learning (HAMRL) method is conducted to solve the problem model. The asynchronous updating rules are introduced in the HAMRL method, and the production time, preventive maintenance (PM) time and corrective repair (CR) time are random, and the deterioration mode of the device is not fixed. Meanwhile, a comparison with a simulated annealing search (SAS) based exploration algorithm and a neighborhood search (NS) based exploration algorithm in reinforcement learning (RL) is presented. The empirical results indicate that the proposed HAMRL algorithm can speed up the learning process, and has a certain advantage for the larger space and the more practical problem. And the maintenance strategy for the 2M1B assembly line system is obtained under the condition of convergent system average cost rate. This paper provides new and practical insights into the application and selection of techniques for maintenance policy of the 2M1B assembly line system.</p>

Full Text