With the rapid development of AI, machine learning has become a hot topic. Among them, reinforcement learning is an important branch of machine learning. With the continuous efforts of scholars, various algorithms emerge in an endless stream. Q-Learning algorithm is a very classic reinforcement learning algorithm, which is the basis of many algorithms. Basically, the Q-table is updated by iteration, so that the agent can choose the best action in the corresponding situation, so as to get closer to the optimal solution. In essence, Q-Learning is sequential difference of different strategies. In the process of learning different strategies, there are two different strategies, goal strategy and behavior strategy. In order to balance the relationship between exploration and exploitation, the -greedy strategy is selected to maintain a certain exploratory property of the agent, and relevant hyperparameters such as learning rate (alpha) and discount factor (gamma) are set. However, the research on Q-Learning hyperparameters is not clear enough. In this paper, the author will study the influence of Q-Learning algorithm hyperparameters on its convergence speed under a relatively simple model.
Read full abstract