Abstract

Regarding the fact that model-based reinforcement learning has a superior performance over traditional RL, in this paper, we extend traditional model-based reinforcement learning for a group of self-interested agents with consecutive action selection trying to find the optimal policy. Every single decision making situation is modeled as extensive form games with perfect information. A modified version of prioritized sweeping is proposed in which subgame perfect equilibrium point is the optimal joint action. Finally, we discuss the algorithm analytically, and provide a formal convergence proof.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call