Abstract

Traditional least-squares temporal difference (LSTD) algorithms provide an efficient way for policy evaluation, but their performance is greatly influenced by the manual selection of state features and their approximation ability is often limited. To overcome these problems, we propose a multikernel recursive LSTD algorithm in this paper. Different from the previous kernel-based LSTD algorithms, the proposed algorithm uses Bellman operator along with projection operator, and constructs the sparse dictionary online. To avoid caching all history samples and reduce the computational cost, it uses the sliding-window technique. To avoid overfitting and reduce the bias caused by the sliding window, it also considers \( L_{2} \) regularization. In particular, to improve the approximation ability, it uses the multikernel technique, which may be the first time to be used for value-function prediction. Experimental results on a 50-state chain problem show the good performance of the proposed algorithm in terms of convergence speed and prediction accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call