Reinforcement learning of program agents has widespread usage today. Especially, Q-learning, a model-free reinforcement learning technique has shown great results in various applications, like games, self-driving car and robot control. Regarding turn-based games many scientists successfully applied it to train artificial intelligence and to make competitive opponent for human player. While algorithms are well-known there is a room for parametrical optimization to achieve maximum learning speed considering specific problem like turn-based board game. As results of this research show the speed can vary significantly. Tic-Tac-Toe is a very simple and old game that gives an opportunity to try Q-learning without excessive efforts. The algorithm is universal and can be applied to more complex games. It worth noting that core of learning algorithm is the same for any similar game – only rules and board size are changed, which is one of the important properties of Q-learning. This paper investigates the impact of learning rate and discount factor on the speed of learning of the Q-learning program agent in Tic-Tac-Toe board game. It is conducted a series of experiments using developed computer implementation of the algorithm to analyze the correlation between learning rate, discount factor, and the convergence rate of Q-learning in the specified game. So the experimental field consists of two factors, each of which has three levels, and full factorial experiment has nine combinations of these factors. The learning speed dependency on the each factor is presented. The findings reveal strong relationships between these parameters and convergence speed. For example, the speed is increased proportionally to the both factors, but in the case of discount factor the increase is about 1.4 times lower. Practical significance of the research is in the optimization of factors to achieve effective training of the software agent in order to save processing time, the payment of which is one of the main expenses of enterprises in the field of information technologies. In addition, the research contributes to a better understanding of how Q-learning performs in different game scenarios and provides guidelines for parameter selection in similar applications
Read full abstract