Abstract

Adaptive critic design is an efficient way to learn optimal action policies on-line, in which a critic network plays an important role to estimate value functions. Because of its good generalization and easy configuration, kernel-based method is prevalently introduced to the construction of critic network. Conventionally the hyper-parameters of kernel-based model need to be predetermined, but empirical selection of them may mislead kernel-based regression with an improper modeling hypothesis space. To tackle this problem, a two-phase iteration of value function approximation and hyper-parameters optimization for Gaussian-kernel based adaptive critic design (GK-ACD) is presented in this paper, which not only approximates the value functions, but also updates the hyper-parameters on-line. Since the two phases are strong coupling, the theoretical proof based on stochastic approximation derives the sufficient conditions guaranteeing the convergence, and points out that the algorithm’s performance mostly relies on the design of coordinated learning rates w.r.t. the two phases. Finally a series of numerical experiments are given to discuss the necessity of two-phase updates and the performance under the coordinated learning rates.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call