Solving the optimal mean-field control problem usually requires complete system information. In this paper, a Q-learning algorithm is discussed to solve the optimal control problem of the unknown mean-field discrete-time stochastic system. First, through the corresponding transformation, we turn the stochastic mean-field control problem into a deterministic problem. Second, the H matrix is obtained through Q-function, and the control strategy relies only on the H matrix. Therefore, solving H matrix is equivalent to solving the mean-field optimal control. The proposed Q-learning method iteratively solves H matrix and gain matrix according to input system state information, without the need for system parameter knowledge. Next, it is proved that the control matrix sequence obtained by Q-learning converge to the optimal control, which shows theoretical feasibility of the Q-learning. Finally, two simulation cases verify the effectiveness of Q-learning algorithm.
Read full abstract