Abstract

Since Deep Blue, which is a chess program, beat the world human chess champion, recent interest in computer games has been directed to shogi. However, the search space for shogi is larger than that of chess and a captured piece is available again in shogi. To overcome these difficulties, we propose a reinforcement learning method by self‐play, in order to obtain a static evaluation function, which is a map from any positions in shogi to real values. Our proposed method is based on temporal difference learning, developed by R. Sutton and applied to backgammon by G. Tesauro. In our method, the neural network, which takes the board description of shogi positions and outputs the winning percentage from the position, is trained by only self‐play without any knowledge of shogi. In order to show the effectiveness of obtained evaluation function, some computational experiments will be presented.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.