Abstract
Temporal difference learning and eligibility traces are two mechanisms for solving reinforcement learning problems. The temporal difference technique bootstraps the state value or state-action value at every step as with dynamic programming, and learns by sampling episodes from experience as in the Monte Carlo approach. Eligibility traces is a mechanism that offers a means for recording the degree of which state is eligible for undergoing learning process. This paper aims to investigate the underlying mechanism of eligibility traces strategies using on-policy and off-policy learning algorithms. In doing so, the performance metrics can be obtained by defining the learning problem in a simulation environment, in conjunction with different learning algorithms. However, measuring learning performance and analysing sensibility are very expensive because such performance metrics can only be obtained by running an experiment with different parameter values. This paper proposes a comparative study for analysing the mechanism of eligibility traces. The objective of this paper is to compare and investigate the influences on performance caused by those different approaches.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have