Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Kei Takada,Hiroyuki Iizuka,Masahito Yamamoto

doi:10.1109/tg.2019.2893343

Kei Takada, Hiroyuki Iizuka + Show 1 more

Open Access

https://doi.org/10.1109/tg.2019.2893343

Copy DOI

Journal: IEEE Transactions on Games	Publication Date: Jan 25, 2019
Citations: 33	License type: other-oa

Affiliation: Hokkaido University

Abstract

Recently, the use of reinforcement-learning algorithms has been proposed to create value and policy functions, and their effectiveness has been demonstrated using Go , Chess , and Shogi . In previous studies, the policy function was trained to predict the search probabilities of each move output by Monte Carlo tree search; thus, a number of simulations were required to obtain the search probabilities. We propose a reinforcement-learning algorithm with game of self-play to create value and policy functions such that the policy function is trained directly from the game results without the search probabilities. In this study, we use Hex , a board game developed by Piet Hein, to evaluate the proposed method. We demonstrate the effectiveness of the proposed learning algorithm in terms of the policy function accuracy, and play a tournament with the proposed computer Hex algorithm DeepEZO and 2017 world-champion programs. The tournament results demonstrate that DeepEZO outperforms all programs. DeepEZO achieved a winning percentage of 79.3% against the world-champion program MoHex2.0 under the same search conditions on $13 \times 13$ board. We also show that the highly accurate policy functions can be created by training the policy functions to increase the number of moves to be searched in the loser position.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Games

Lead the way for us

Similar Papers

Complex-Valued Reinforcement Learning: a Context-Based Approach for POMDPs
Takeshi Shibuya ... Tomoki Hamagami
-
Takeshi Shibuya, et. al.Takeshi Shibuya ... Tomoki Hamagami
14 Jan 2011
14 Jan 2011

Prediction Distortion in Monte Carlo Tree Search and an Improved Algorithm
William Li
Journal of Intelligent Learning Systems and Applications | VOL. 10
William LiWilliam Li
01 Jan 2018
Journal of Intelligent Learning Systems and Applications | VOL. 10

Non-Asymptotic Analysis of Monte Carlo Tree Search
Devavrat Shah ... Qiaomin Xie
-
Devavrat Shah, et. al.Devavrat Shah ... Qiaomin Xie
08 Jun 2020
08 Jun 2020

Reinforcement learning and simulation-based search in computer go

-

01 Jan 2009
01 Jan 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Games