Value Function Dynamic Estimation in Reinforcement Learning based on Data Adequacy

Huifan Gao,Jing Tang,Langcai Cao,Yifeng Zeng,Yinghui Pan,Peihua Chai

doi:10.1145/3409501.3409517

Abstract

In recent years, reinforcement learning has played an important role in the study of decision problem in computer games. To solve the problem of how to better estimate the value function with limited computational resources, this paper proposes a dynamic estimation method of value function based on data adequacy. In consideration of the varying complexity of each state in the MDP model, we propose a dynamic value function estimation method which is different from the fixed value function estimation method in traditional methods. Based on the PigChase challenge of the Malmo project launched by Microsoft in 2017, we compare the new method with the existing techniques. Experimental results show that the performance of the proposed algorithm is better than traditional algorithms.

Full Text