A statistical property of multiagent learning based on Markov decision process.

Kazunori Iwata,Kazushi Ikeda,Hideaki Sakai

doi:10.1109/tnn.2006.875990

A statistical property of multiagent learning based on Markov decision process.

Kazunori Iwata, Kazushi Ikeda + Show 1 more

https://doi.org/10.1109/tnn.2006.875990

Copy DOI

Journal: IEEE transactions on neural networks	Publication Date: Jul 1, 2006
Citations: 25

Affiliation: Hiroshima City University

#Asymptotic Equipartition Property #Multiagent Markov Decision Process + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We exhibit an important property called the asymptotic equipartition property (AEP) on empirical sequences in an ergodic multiagent Markov decision process (MDP). Using the AEP which facilitates the analysis of multiagent learning, we give a statistical property of multiagent learning, such as reinforcement learning (RL), near the end of the learning process. We examine the effect of the conditions among the agents on the achievement of a cooperative policy in three different cases: blind, visible, and communicable. Also, we derive a bound on the speed with which the empirical sequence converges to the best sequence in probability, so that the multiagent learning yields the best cooperative result.

Full Text