Difference rewards policy gradients

Jacopo Castellini,Sam Devlin,Rahul Savani,Frans A Oliehoek

doi:10.1007/s00521-022-07960-5

Abstract

AbstractPolicy gradient methods have become one of the most popular classes of algorithms for multi-agent reinforcement learning. A key challenge, however, that is not addressed by many of these methods is multi-agent credit assignment: assessing an agent’s contribution to the overall performance, which is crucial for learning good policies. We propose a novel algorithm called Dr.Reinforce that explicitly tackles this by combining difference rewards with policy gradients to allow for learning decentralized policies when the reward function is known. By differencing the reward function directly, Dr.Reinforce avoids difficulties associated with learning the Q-function as done by counterfactual multi-agent policy gradients (COMA), a state-of-the-art difference rewards method. For applications where the reward function is unknown, we show the effectiveness of a version of Dr.Reinforce that learns an additional reward network that is used to estimate the difference rewards.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Difference rewards policy gradients

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications

Lead the way for us

Journal: Neural Computing and Applications	Publication Date: Nov 11, 2022
License type: CC BY 4.0

Similar Papers

PP-PG: Combining Parameter Perturbation with Policy Gradient Methods for Effective and Efficient Explorations in Deep Reinforcement Learning
Shilei Li ... Meng Li
ACM Transactions on Intelligent Systems and Technology | VOL. 12
Shilei Li, et. al.Shilei Li ... Meng Li
03 Jun 2021
ACM Transactions on Intelligent Systems and Technology | VOL. 12

Researches advanced in multi-agent credit assignment in reinforcement learning
Yuzheng Wu
-
Yuzheng WuYuzheng Wu
11 Nov 2022
11 Nov 2022

RevAP: A bankruptcy-based algorithm to solve the multi-agent credit assignment problem in task start threshold-based multi-agent systems
Hossein Yarahmadi ... Moharram Challenger
Robotics and Autonomous Systems | VOL. 174
Hossein Yarahmadi, et. al.Hossein Yarahmadi ... Moharram Challenger
12 Jan 2024
Robotics and Autonomous Systems | VOL. 174

Dynamic Economic Optimization of a Continuously Stirred Tank Reactor Using Reinforcement Learning
Derek Machalek ... Titus Quah
-
Derek Machalek, et. al.Derek Machalek ... Titus Quah
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Difference rewards policy gradients

Abstract

Talk to us

Similar Papers

More From: Neural Computing and Applications