Abstract

Researches on computer games for Go, Chess, and Japanese Chess stand out as one of the notable landmarks in the progress of artificial intelligence. AlphaGo, AlphaGo Zero, and AlphaZero algorithms, which are called AlphaZero style (AZ-style) algorithms in some literature [1] , have achieved superhuman performance by using deep reinforcement learning (DRL). However, the unavailability of training details, expensive equipment used for model training, and the low evaluation accuracy resulted by slow self-play training without expensive computing equipment in practical applications have been the defects of AZ-style algorithms. To solve the problems to a certain extent, the paper proposes an improved online sequential extreme learning machine (IOS-ELM), a new evaluation method, to evaluate chess board positions for AZ-style algortihm. Firstly, the theoretical principles of IOS-ELM is given. Secondly, the study considers Gomoku as the application object and uses IOS-ELM as the evaluation method for AZ-style’s board positions to discuss the loss in the training process and hyperparameters affecting performance in detail. Under the same experimental conditions, the proposed method reduces the training parameters by 14 times, training time to 15%, and error of evaluation by 13% compared with the board evaluation network used in original AZ-style algorithms.

Highlights

  • The computer game is the longest-studied domain in the history of artificial intelligence

  • AlphaGo’s [3] descendant AlphaGo Zero [4] and the comprehensive AlphaZero [5] achieved superhuman performance in Go, Chess, and Shogi without using any human knowledge and experience, thereby indicating that deep reinforcement learning combined with MCTS [6] achieves good results for

  • The AZ-style algorithms can be reproduced in code, it is currently difficult to apply it to other board games and achieve superhuman performance due to its dependence on large computational resources

Read more

Summary

INTRODUCTION

The computer game is the longest-studied domain in the history of artificial intelligence. IOS-ELM fully utilizes the feature extraction ability of CNNs and makes use of the advantage provided by the excellent classification performance of OS-ELM It exhibits good results on several popular classification testing datasets with fast training speed and high accuracy. AZ-style algorithms train a single neural network, which use policy head to select moves and value head to evaluate board positions [4]. In practical applications of AZ-style algorithms, self-play training without efficient computing resources will be slow and result in low evaluation accuracy. The main intellectual contributions are summarized as follows: 1) We propose a new method IOS-ELM It integrates AZ-style algorithms and can evaluate board positions.

RELATED WORKS
TRAINING OF AZ-STYLE ALGORITHMS
MATHEMATICAL PRINCIPLES AND IMPROVEMENT
NETWORK STRUCTURE AND TRAINING OF THE NEW MODEL
EXPERIMENTAL DESIGN AND RESULT ANALYSIS
Findings
CONCLUSIONS
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.