Due to the real applications, optimal consensus reinforcement learning with switching topology is still challenging due to the complexity of topological changes. This paper investigates the optimal consensus control problem for discrete multi-agent systems under Markov switching topologies. The goal is to design an appropriate algorithm to find the optimal control policies that minimize the performance index while achieving consensus among the agents. The concept of mean-square consensus is introduced, and the relationship between consensus error and tracking error to achieve mean-square consensus is studied. A performance function for each agent under switching topologies is established and a policy iteration algorithm using system data is proposed based on the Bellman optimality principle. The theoretical analysis shows that the consensus error realizes mean-square consensus and the performance function is optimized. The efficacy of the suggested approach is confirmed by numerical simulation using an actor–critic neural network. As a result, the value function is the optimum and the mean-square consensus can be reached using this technique.