Abstract

Kernel-based reinforcement learning has received increasing attention because it requires less prior knowledge linear approximation and neural networks. Online kernel-based updating, however, is hindered by the challenge of catastrophic forgetting or interference. Sparse representation is a key method to address this issue, but existing methods fail to satisfy four criteria: learnability, nonprior, nontruncation, and explicitness. In this paper, we present an attentive kernel-based value function approximation as a learnable, nonprior, nontruncated, and explicit sparse representation. We propose the online attentive kernel-based temporal difference (OAKTD) algorithm, which employs two-timescale optimization, and provide a convergence analysis for our proposed algorithm. Experimental results show that OAKTD outperforms online kernel-based TD learning algorithms, and the TD learning algorithm with Tile Coding on classical tasks, i.e., Mountain Car, Acrobot, CartPole and Puddle World.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call