A Gradient Descent Sarsa(λ) Algorithm Based on the Adaptive Reward-shaping Mechanism

Quan Liu,Qiming Fu,Fei Xiao,Yuchen Fu

doi:10.1080/10798587.2013.869119

Abstract

Based on the adaptive reward-shaping mechanism, we propose a novel gradient descent (GD) Sarsa(λ) algorithm to solve the problems of ill initial performance and low convergence speed in the reinforcement learning tasks with continuous state space. Adaptive normalized radial basis function (ANRBF) network is used to shape reward. The reward-shaping mechanism propagates model knowledge to the learner in the form of the additional reward signal so that the initial performance and convergence speed can be improved effectively. A function approximation algorithm named ANRBF-GD-Sarsa(λ) is proposed based on the ANRBF network. The convergence of ANRBF-GD-Sarsa(λ) is analyzed theoretically. Experiments are conducted to show the good initial performance and high convergence speed of the proposed algorithm.

Full Text