An Acquisition of Evaluation Function for Shogi by Learning Self‐Play

Masahito Yamamoto,Azuma Ohuchi,Keiji Suzuki

doi:10.1111/1475-3995.00267

Abstract

Since Deep Blue, which is a chess program, beat the world human chess champion, recent interest in computer games has been directed to shogi. However, the search space for shogi is larger than that of chess and a captured piece is available again in shogi. To overcome these difficulties, we propose a reinforcement learning method by self‐play, in order to obtain a static evaluation function, which is a map from any positions in shogi to real values. Our proposed method is based on temporal difference learning, developed by R. Sutton and applied to backgammon by G. Tesauro. In our method, the neural network, which takes the board description of shogi positions and outputs the winning percentage from the position, is trained by only self‐play without any knowledge of shogi. In order to show the effectiveness of obtained evaluation function, some computational experiments will be presented.

Full Text