This paper develops a degradation tolerant optimal control in the framework of Reinforcement Learning (RL). Safety-critical and mission-critical systems require the development of new control designs that maintain system stability and performance specifications but also address incipient degradation. The aim of this work is to decelerate the speed of degradation by minimizing a cost function that includes the rate of evolution of degradation and the performance requirements. The controller is developed for discrete-time nonlinear systems affine in control, where the system's states are affected by a nonlinear degradation. Value iteration (VI) algorithm based approach is developed to find suitable approximations of both optimal control policy and optimal cost, while guaranteeing closed-loop stability and minimization of degradation rate. Offline model-based Adaptive Dynamic Programming (ADP) algorithm is developed and implemented using actor-critic structure which involves training of both actor and critic neural networks (NN). After training the actor NN with the optimal policy, the NN is implemented in real time to generate the input of the system. Simulation example shows the efficiency and feasibility of the algorithm.
Read full abstract