Safe Reinforcement Learning Algorithm and Its Application in Intelligent Control for CPS

Hengjun Zhao,Quanzhong Li,Zhiming Liu,Xia Zeng

doi:10.21655/ijsi.1673-7288.00284

Abstract

PDF HTML XML Export Cite reminder Safe Reinforcement Learning Algorithm and Its Application in Intelligent Control for CPS DOI: 10.21655/ijsi.1673-7288.00284 Author: Affiliation: Clc Number: Fund Project: Article | Figures | Metrics | Reference | Related | Cited by | Materials | Comments Abstract:The design of a safe controller for a Cyber-Physical System (CPS) is a hot research topic. The existing safety controller design based on formal methods has problems such as excessive reliance on models and poor scalability. Intelligent control based on Deep Reinforcement Learning (DRL) can handle high-dimensional nonlinear complex systems and uncertain systems and is becoming a very promising CPS control technology, but it lacks safety guarantees. This study addresses the safety issues of Reinforcement Learning (RL) control by analyzing a typical case of an industrial oil pump control system and carries out research on a Safe Reinforcement Learning (SRL) algorithm and intelligent control application. First, the SRL issue of the industrial oil pump control system is formalized, and a simulation environment of the oil pump is built. Then, by designing the structure and activation function of the output layer, an oil pump controller in the form of a neural network is constructed to satisfy the linear inequality constraints of the on-off operations of the oil pump. Finally, in order to better balance the safety and optimality control objectives, a new SRL algorithm is designed based on the Augmented Lagrange Multiplier (ALM) method. A comparative experiment on the industrial oil pump shows that the controller synthesized by the proposed algorithm surpasses existing similar algorithms both in safety and optimality. During the evaluation, the neural network controllers synthesized in this study pass rigorous formal verification with a probability of 90%. Meanwhile, compared with the theoretically optimal controller, neural network controllers achieve an optimal objective value loss of 2%. The proposed method is expected to be applied in more scenarios, and the case study scheme may provide a reference for other researchers in the field of safe intelligent control and formal verification. Reference Related Cited by

Full Text