Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Yizhou Zhang,Yiheng Lin,Zaiwei Chen,Guannan Qu,Pan Xu,Adam Wierman

doi:10.1145/3579443

Abstract

We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensionality and to reduce communication, we propose a Localized Policy Iteration (LPI) algorithm that provably learns a near-globally-optimal policy using only local information. In particular, we show that, despite restricting each agent's attention to only its κ-hop neighborhood, the agents are able to learn a policy with an optimality gap that decays polynomially in κ. In addition, we show the finite-sample convergence of LPI to the global optimal policy, which explicitly captures the trade-off between optimality and computational complexity in choosing κ. Numerical simulations demonstrate the effectiveness of LPI.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems

Lead the way for us

Journal: Proceedings of the ACM on Measurement and Analysis of Computing Systems	Publication Date: Feb 27, 2023
Citations: 3

Similar Papers

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning
Yizhou Zhang ... Pan Xu
-
Yizhou Zhang, et. al.Yizhou Zhang ... Pan Xu
19 Jun 2023
19 Jun 2023

Distributed Policy Evaluation with Fractional Order Dynamics in Multiagent Reinforcement Learning
Wei Dai ... Zhenhua Tan
Security and Communication Networks | VOL. 2021
Wei Dai, et. al.Wei Dai ... Zhenhua Tan
03 Sep 2021
Security and Communication Networks | VOL. 2021

Policy Adaptive Multi-agent Deep Deterministic Policy Gradient
Yixiang Wang ... Feng Wu
-
Yixiang Wang, et. al.Yixiang Wang ... Feng Wu
01 Jan 2020
01 Jan 2020

Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms on a Building Energy Demand Coordination Task
Gauraang Dhamankar ... Zoltan Nagy
-
Gauraang Dhamankar, et. al.Gauraang Dhamankar ... Zoltan Nagy
17 Nov 2020
17 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the ACM on Measurement and Analysis of Computing Systems