The Convergence of a Cooperation Markov Decision Process System.

Xiaoling Mo,Daoyun Xu,Zufeng Fu

doi:10.3390/e22090955

Xiaoling Mo, Daoyun Xu + Show 1 more

Open Access

https://doi.org/10.3390/e22090955

Copy DOI

Abstract

In a general Markov decision progress system, only one agent’s learning evolution is considered. However, considering the learning evolution of a single agent in many problems has some limitations, more and more applications involve multi-agent. There are two types of cooperation, game environment among multi-agent. Therefore, this paper introduces a Cooperation Markov Decision Process system with two agents, which is suitable for the learning evolution of cooperative decision between two agents. It is further found that the value function in the system also converges in the end, and the convergence value is independent of the choice of the value of the initial value function. This paper presents an algorithm for finding the optimal strategy pair in the system, whose fundamental task is to find an optimal strategy pair and form an evolutionary system . Finally, an example is given to support the theoretical results.

Highlights

Artificial intelligence technology has become one of the most important technologies, nowadays.AlphaGo, unmanned driving, voice recognition, face recognition and other well-known technologies involve artificial intelligence
This paper presents an algorithm for finding the optimal strategy pair in the Cooperation Markov Decision Process (CMDP) system, whose fundamental task is to find an optimal strategy pair and form an evolutionary system CMDP
This paper only considers the Cooperation Markov Decision Process (CMDP) system of two agents, which is suitable for the evolutionary learning system of cooperative decision between two agents

Summary

Introduction

Artificial intelligence technology has become one of the most important technologies, nowadays. The important basis of reinforcement learning [5] is Markov Decision Process ( MDP) system [6]. Two-agent games for multi-agent reinforcement learning are similar to perceptrons for neural networks.In this kind of learning model, agents alternately execute behaviors, seek optimal criteria based on social value, seek optimal strategies (πk0 , πk1 ) , and jointly complete the target task. This paper introduces a cooperation Markov decision process system in the form of definition, two trade agent (Alice and Bob) on the basis of its strategy to perform an action. The convergence property of the value function of the MDP system with the participation of a single agent is given, the convergence phenomenon of the value function in the cooperation Markov decision process system proposed in this paper is further explored, and the correctness of the property is proved from both the experimental and theoretical perspectives

Markov Reward Process System

Markov Decision Process System

Cooperation Markov Decision Process System

Cooperation Markov Decision Process System with Two Agents

Convergence of the Social Value Function of CMDP System

Algorithm for Optimal Strategy Pairs in Cooperation Type CMDP System

An Application Example of Cooperation Markov Decision Process System

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Entropy (Basel, Switzerland)	Publication Date: Aug 30, 2020
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

The Convergence of a Cooperation Markov Decision Process System.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)

Lead the way for us

Similar Papers

Horizontal cooperations between logistics service providers: motives, structure, performance
Christina Schmoltzi ... Carl Marcus Wallenburg
International Journal of Physical Distribution & Logistics Management | VOL. 41
Christina Schmoltzi, et. al.Christina Schmoltzi ... Carl Marcus Wallenburg
12 Jul 2011
International Journal of Physical Distribution & Logistics Management | VOL. 41

Successive Over-Relaxation ${Q}$ -Learning
Chandramouli Kamanchi ... Shalabh Bhatnagar
IEEE Control Systems Letters | VOL. 4
Chandramouli Kamanchi, et. al.Chandramouli Kamanchi ... Shalabh Bhatnagar
01 Jan 2020
IEEE Control Systems Letters | VOL. 4

A Post-Disaster Functional Asset Value Index for School Buildings
Rizalyn C Ilumin ... Andres Winston C Oreta
Procedia Engineering | VOL. 212
Rizalyn C Ilumin, et. al.Rizalyn C Ilumin ... Andres Winston C Oreta
01 Jan 2018
Procedia Engineering | VOL. 212

Value Function Discovery in Markov Decision Processes With Evolutionary Algorithms
Martijn Onderwater ... Rob Van Der Mei
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 46
Martijn Onderwater, et. al.Martijn Onderwater ... Rob Van Der Mei
01 Jan 2015
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 46

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Convergence of a Cooperation Markov Decision Process System.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Entropy (Basel, Switzerland)