Online selective kernel-based temporal difference learning.

Xingguo Chen Xingguo Chen,Yang Gao Yang Gao,Ruili Wang Ruili Wang

doi:10.1109/tnnls.2013.2270561

Abstract

In this paper, an online selective kernel-based temporal difference (OSKTD) learning algorithm is proposed to deal with large scale and/or continuous reinforcement learning problems. OSKTD includes two online procedures: online sparsification and parameter updating for the selective kernel-based value function. A new sparsification method (i.e., a kernel distance-based online sparsification method) is proposed based on selective ensemble learning, which is computationally less complex compared with other sparsification methods. With the proposed sparsification method, the sparsified dictionary of samples is constructed online by checking if a sample needs to be added to the sparsified dictionary. In addition, based on local validity, a selective kernel-based value function is proposed to select the best samples from the sample dictionary for the selective kernel-based value function approximator. The parameters of the selective kernel-based value function are iteratively updated by using the temporal difference (TD) learning algorithm combined with the gradient descent technique. The complexity of the online sparsification procedure in the OSKTD algorithm is O(n). In addition, two typical experiments (Maze and Mountain Car) are used to compare with both traditional and up-to-date O(n) algorithms (GTD, GTD2, and TDC using the kernel-based value function), and the results demonstrate the effectiveness of our proposed algorithm. In the Maze problem, OSKTD converges to an optimal policy and converges faster than both traditional and up-to-date algorithms. In the Mountain Car problem, OSKTD converges, requires less computation time compared with other sparsification methods, gets a better local optima than the traditional algorithms, and converges much faster than the up-to-date algorithms. In addition, OSKTD can reach a competitive ultimate optima compared with the up-to-date algorithms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Online selective kernel-based temporal difference learning.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems

Lead the way for us

Journal: IEEE transactions on neural networks and learning systems	Publication Date: Dec 1, 2013
Citations: 26

Similar Papers

Structure-preserving sparsification methods for social networks
Michael Hamann ... Gerd Lindner
Social Network Analysis and Mining | VOL. 6
Michael Hamann, et. al.Michael Hamann ... Gerd Lindner
29 Apr 2016
Social Network Analysis and Mining | VOL. 6

Designing granular solution methods for routing problems with time windows
Michael Schneider ... Daniele Vigo
European Journal of Operational Research | VOL. 263
Michael Schneider, et. al.Michael Schneider ... Daniele Vigo
04 May 2017
European Journal of Operational Research | VOL. 263

Which Temporal Difference learning algorithm best reproduces dopamine activity in a multi-choice task?
Jean Bellot ... Mehdi Khamassi
BMC Neuroscience | VOL. 14
Jean Bellot, et. al.Jean Bellot ... Mehdi Khamassi
01 Jul 2013
BMC Neuroscience | VOL. 14

Which Temporal Difference Learning Algorithm Best Reproduces Dopamine Activity in a Multi-choice Task?
Jean Bellot ... Olivier Sigaud
-
Jean Bellot, et. al.Jean Bellot ... Olivier Sigaud
01 Jan 2012
01 Jan 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Online selective kernel-based temporal difference learning.

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on neural networks and learning systems