Abstract

In recent years, reinforcement learning has played an important role in the study of decision problem in computer games. To solve the problem of how to better estimate the value function with limited computational resources, this paper proposes a dynamic estimation method of value function based on data adequacy. In consideration of the varying complexity of each state in the MDP model, we propose a dynamic value function estimation method which is different from the fixed value function estimation method in traditional methods. Based on the PigChase challenge of the Malmo project launched by Microsoft in 2017, we compare the new method with the existing techniques. Experimental results show that the performance of the proposed algorithm is better than traditional algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call