Abstract

This work discusses what an (independent) reinforcement learning agent can do in a multiagent environment. In particular, we consider a stateless Q-learning agent in a Prisoner's Dilemma (PD) game. Although it had been shown in the literature that stateless, independent Q-learning agents had been difficult to cooperate with each other in an iterated PD (IPD) game, we gave a condition of PD payoffs and Q-learning parameters that helps the agents cooperate with each other. Based on the condition, we also discussed the ratio of mutual cooperation happening in IPD games. It supposed that mutual cooperation was fragile, i.e., one misfortune defection would have the agents slide down the spiral of mutual defection. However, it is not always correct. Mutual cooperation will reinforce itself and thus it will be robust and resilient. Hence, this work analytically derives how long a series of mutual cooperation continues once it happened while considering the resilience. It gives us further comprehension of the process of reinforcement learning in IPD games.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call