MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Zhanhong Jiang,Aditya Balu,Soumik Sarkar,Young M Lee,Kai Liang Tan,Chinmay Hegde,Xian Yeow Lee,Sin Yong Tan

doi:10.1609/aaai.v36i9.21169

Abstract

We propose a novel policy gradient method for multi-agent reinforcement learning, which leverages two different variance-reduction techniques and does not require large batches over iterations. Specifically, we propose a momentum-based decentralized policy gradient tracking (MDPGT) where a new momentum-based variance reduction technique is used to approximate the local policy gradient surrogate with importance sampling, and an intermediate parameter is adopted to track two consecutive policy gradient surrogates. MDPGT provably achieves the best available sample complexity of O(N -1 e -3) for converging to an e-stationary point of the global average of N local performance functions (possibly nonconcave). This outperforms the state-of-the-art sample complexity in decentralized model-free reinforcement learning and when initialized with a single trajectory, the sample complexity matches those obtained by the existing decentralized policy gradient methods. We further validate the theoretical claim for the Gaussian policy function. When the required error tolerance e is small enough, MDPGT leads to a linear speed up, which has been previously established in decentralized stochastic optimization, but not for reinforcement learning. Lastly, we provide empirical results on a multi-agent reinforcement learning benchmark environment to support our theoretical findings.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 2

Similar Papers

Reinforcement Learning for Clinical Applications.
Kia Khezeli ... Benjamin Shickel
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18
Kia Khezeli, et. al.Kia Khezeli ... Benjamin Shickel
08 Feb 2023
Clinical journal of the American Society of Nephrology : CJASN | VOL. 18

Q-PrOP: Sample-efficient policy gradient with an off-policy critic
...
-
, et. al. ...
28 Feb 2017
28 Feb 2017

Distributional Policy Gradient With Distributional Value Function.
Qi Liu ... Yunjiang Lou
IEEE transactions on neural networks and learning systems | VOL. PP
Qi Liu, et. al.Qi Liu ... Yunjiang Lou
01 Jan 2024
IEEE transactions on neural networks and learning systems | VOL. PP

A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning
Junru Shi ... Qingtao Wu
Complex & Intelligent Systems | VOL. 10
Junru Shi, et. al.Junru Shi ... Qingtao Wu
12 Jul 2024
Complex & Intelligent Systems | VOL. 10

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

MDPGT: Momentum-Based Decentralized Policy Gradient Tracking

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence