Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Yan Zheng,Zhao-Peng Meng,Xiao-Tian Hao,Jian-Ye Hao,Zong-Zhang Zhang

doi:10.1007/s11390-020-9967-6

Abstract

Multiagent deep reinforcement learning (MA-DRL) has received increasingly wide attention. Most of the existing MA-DRL algorithms, however, are still inefficient when faced with the non-stationarity due to agents changing behavior consistently in stochastic environments. This paper extends the weighted double estimator to multiagent domains and proposes an MA-DRL framework, named Weighted Double Deep Q-Network (WDDQN). By leveraging the weighted double estimator and the deep neural network, WDDQN can not only reduce the bias effectively but also handle scenarios with raw visual inputs. To achieve efficient cooperation in multiagent domains, we introduce a lenient reward network and scheduled replay strategy. Empirical results show that WDDQN outperforms an existing DRL algorithm (double DQN) and an MA-DRL algorithm (lenient Q-learning) regarding the averaged reward and the convergence speed and is more likely to converge to the Pareto-optimal Nash equilibrium in stochastic cooperative environments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science and Technology

Lead the way for us

Journal: Journal of Computer Science and Technology	Publication Date: Mar 1, 2020
Citations: 11

Similar Papers

Weighted Double Deep Multiagent Reinforcement Learning in Stochastic Cooperative Environments
Yan Zheng ... Zongzhang Zhang
-
Yan Zheng, et. al.Yan Zheng ... Zongzhang Zhang
01 Jan 2018
01 Jan 2018

Automatic depth matching method of well log based on deep reinforcement learning
Wenjun Xiong ... Wenzheng Yue
Petroleum Exploration and Development Online | VOL. 51
Wenjun Xiong, et. al.Wenjun Xiong ... Wenzheng Yue
01 Jun 2024
Petroleum Exploration and Development Online | VOL. 51

Independent Learning Approaches: Overcoming Multi-Agent Learning Pathologies In Team-Games

-

06 Mar 2020
06 Mar 2020

Multi-Agent Reinforcement Learning Based Fully Decentralized Dynamic Time Division Configuration for 5G and B5G Network.
Xiangyu Chen ... Gang Chuai
Sensors (Basel, Switzerland) | VOL. 22
Xiangyu Chen, et. al.Xiangyu Chen ... Gang Chuai
23 Feb 2022
Sensors (Basel, Switzerland) | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Multiagent Policy Optimization Based on Weighted Estimators in Stochastic Cooperative Environments

Abstract

Talk to us

Similar Papers

More From: Journal of Computer Science and Technology