Abstract

Temporal difference (TD) learning is an incremental learning method to accumulate the learning errors. The trade-off between exploration and exploitation has an important impact on the performance of TD learning. The selection of TD parameters affects its performance too. However, measuring learning performance and analyzing sensitivity of parameters are difficult and expensive, and such performance metrics are normally obtained only by running an extensive set of experiments with different parameter values. In this paper, we aim to address the convergence and parameter tuning issues of TD learning by using the ‘design of experiments’ techniques. We present a modified Sarsa ( λ) control algorithm by sampling actions in conjunction with the simulated annealing (SA) method. Furthermore, the parameter selection issue is tackled by using the response surface methodology (RSM). A soccer game, which has a large, dynamic and continuous state space, is used as the simulation environment. Convergence is investigated in terms of performance metrics, which are obtained through the soccer game simulation. The empirical results demonstrate that the quality of convergence has been significantly improved by using the SA and RSM techniques.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.