Least-squares temporal difference learning with eligibility traces based on regularized extreme learning machine

Dazi Li,Luntong Li,Qibing Jin,Tianheng Song

doi:10.1109/ccdc.2016.7532255

Abstract

The task of learning the value function under a fixed policy in continuous Markov decision processes (MDPs) is considered. Although ELM has fast learning speed and can avoid tuning issues of traditional artificial neural network (ANN), the randomness of the ELM parameters would result in fluctuating performance. In this paper, a least-squares temporal difference algorithm with eligibility traces based on regularized extreme learning machine (RELM-LSTD(X)) is proposed to overcome these problems caused by ELM in Reinforcement Learning problem. The proposed algorithm combined the LSTD(X) algorithm with RELM. The RELM is used to approximate value functions. Furthermore, the eligibility trace term is introduced to increase data efficiency. In experiments, the performances of the proposed algorithm are demonstrated and compared with those of LSTD and ELM-LSTD. Experiment results show that the proposed algorithm can achieve a more stable and better performance in approximating the value function under a fixed policy.

Full Text