Convergence and numerical stability of action-dependent heuristic dynamic programming algorithms based on RLS learning for online DLQR optimal control

Patrícia Helena Moraes Rêgo,Guilherme Bonfim De Sousa

doi:10.1504/ijcse.2019.10020394

Abstract

The development and the numerical stability analysis of a novel algorithm of approximate dynamic programming (ADP) based on RLS learning for approximating the optimal control solution online in real-time are the main issues of this paper. The approximate dynamic programming is a method developed to make possible the use of dynamic programming techniques in real-time, but this method has a reasonable mathematical complexity due to the size of the internal matrices of the algorithm and the need for inversion of some of them. Thus, focusing on improving numerical stability and computational cost of ADP algorithms, more specifically in the action-dependent heuristic dynamic programming and optimal control context, UDUT-type unitary transformations are integrated in actor-critic architectures, which produce algorithms with better specifications for implementation in real-world optimal control systems. The control and stabilisation of the inverted pendulum system on a motor driven cart is established as a study platform to evaluate the convergence and numerical stability for the estimated parameters of the proposed algorithm.

Full Text