Abstract

Monte-Carlo Tree Search (MCTS) has achieved great success in combinatorial game, which has the characteristics of finite action state space, deterministic state transition and sparse reward. AlphaGo Zero combined MCTS and deep neural networks defeated the world champion Lee Sedol in the Go game, proving the advantages of tree search in combinatorial game with enormous search space. However, when the search space is continuous and even with chance factors, tree search methods like UCT failed. Because each state will be visited repeatedly with probability zero and the information in tree will never be used, that is to say UCT algorithm degrades to Monte Carlo rollouts. Meanwhile, the previous exploration experiences cannot be used to correct the next tree search process, and makes a huge increase in the demand for computing resources. To solve this kind of problem, this paper proposes a step-by-step Reverse Curriculum Learning with Truncated Tree Search method (RevCuT Tree Search). In order to retain the previous exploration experiences, we use the deep neural network to learn the state-action values at explored states and then guide the next tree search process. Besides, taking the computing resources into consideration, we establish a truncated search tree focusing on continuous state space rather than the whole trajectory. This method can effectively reduce the number of explorations and achieve the effect beyond the human level in our well designed single-player game with continuous state space and probabilistic state transition.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call