This paper presents a novel optimal control approach resulting from the combination between the safe Reinforcement Learning (RL) framework represented by a Deep Deterministic Policy Gradient (DDPG) algorithm and a Slime Mould Algorithm (SMA) as a representative nature-inspired optimization algorithm. The main drawbacks of the traditional DDPG-based safe RL optimal control approach are the possible instability of the control system caused by randomly generated initial values of the controller parameters and the lack of state safety guarantees in the first iterations of the learning process due to (i) and (ii): (i) the safety constraints are considered only in the DDPG-based training process of the controller, which is usually implemented as a neural network (NN); (ii) the initial values of the weights and the biases of the NN-based controller are initialized with randomly generated values. The proposed approach mitigates these drawbacks by initializing the parameters of the NN-based controller using SMA. The fitness function of the SMA-based initialization process is designed to incorporate state safety constraints into the search process, resulting in an initial NN-based controller with embedded state safety constraints. The proposed approach is compared to the classical one using real-time experimental results and performance indices popular for optimal reference tracking control problems and based on a state safety score.
Read full abstract