Abstract

This paper presents a novel optimal reference tracking control approach resulted from the combination of a popular policy gradient Reinforcement Learning (RL) algorithm, namely Proximal Policy Optimization (PPO), and a metaheuristic Slime Mould Algorithm (SMA). One of the most important parameters in the PPO-based RL process is the learning rate, which has a big impact on how the parameters of the actor neural network (NN) are iteratively updated. In every episode of the RL process, the weights and the biases of the actor NN are multiplied with the learning rate, determining how much the learning agent will step into a certain direction computed based on previous experiences. The classical PPO algorithm usually relies on fixed values for the learning rates which rarely change, or not at all, during the learning process. However, its main drawback is that the learning agent cannot take advantage of positive momentum in the learning process by accelerating towards good learning experiences or slow down and quickly change the direction in the case of consecutive negative learning experiences. The main objective of the combination proposed in this paper is to create an adaptive SMA-based PPO approach applied to control systems, which instead of using fixed learning rate values, it uses the SMA to compute optimal values of the learning rates in each time step of the learning process based on the progress of the learning agent. This paper investigates if the adaptive SMA-based PPO control approach can be considered as an alternative to the classical PPO version, which employs fixed values of the learning rate. A comparison is carried out using control system performance indices gathered while performing an optimal reference tracking control task on tower crane system laboratory equipment.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call