Abstract

Evaluation functions are crucial for building strong computer players in two-player games, such as chess, Go, and shogi. Although a linear combination of a large number of features has been popular representation of an evaluation function in shogi, deep neural networks (DNNs) are recently considered to be more promising by the success of AlphaZero in multiple domains, chess, Go, and shogi. This paper shows that three loss functions, loss in comparison training, temporal difference (TD) errors and cross entropy loss in win prediction, are effective for the training of evaluation functions in shogi, presented in deep neural networks. For the training of DNNs in AlphaZero, the main loss function only consists of win prediction, though it is augmented with move prediction for regularization. On the other hand, for training in traditional shogi programs, various losses including loss in comparison training, TD errors, and cross entropy loss in win prediction, have contributed to yield accurate evaluation functions which are the linear combination of a large number of features. Therefore, it is promising to combine these loss functions and to apply them to the training of modern DNNs. In our experiments, we show that training with combinations of loss functions improved the accuracy of evaluation functions represented by DNNs. The performance of trained evaluation functions is tested through top-1 accuracy, 1-1 accuracy, and self-play.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call