Many recent studies have used reinforcement learning methods to investigate the behavior of agents in evolutionary games. Q-learning, in particular, has become a mainstream method during this development. Here we introduce Q-learning agents into the evolutionary prisoner's dilemma game on a square lattice. Specifically, we associate the state space of Q-learning agents with the strategies of their neighbors, and we introduce a neighboring reward information sharing mechanism. We thus provide Q-learning agents with the payoff information of their neighbors, in addition to their strategies, which has not been done in previous studies. Through simulations, we show that considering neighborhood payoff information can significantly promote cooperation in the population. Moreover, we show that for an appropriate strength of neighborhood payoff information sharing, a chessboard pattern emerges on the lattice. We analyze in detail the reasons for the emergence of the chessboard pattern and the increase in cooperation frequency, and we also provide a theoretical analysis based on the pair approximation method. We hope that our research will inspire effective approaches for resolving social dilemmas by means of sharing more information among reinforcement learning agents during evolutionary games.
Read full abstract