Multi-agent Reinforcement Learning Algorithm Research Articles

Multi-agent reinforcement learning (MARL) algorithms based on trust regions (TR) have achieved significant success in numerous cooperative multi-agent tasks. These algorithms restrain the Kullback–Leibler (KL) divergence (i.e., TR constraint) between the current and new policies to avoid aggressive update steps and improve learning performance. However, the majority of existing TR-based MARL algorithms are on-policy, meaning that they require new data sampled by current policies for training and cannot utilize off-policy (or historical) data, leading to low sample efficiency. This study aims to enhance the data efficiency of TR-based learning methods. To achieve this, an approximation of the original objective function is designed. In addition, it is proven that as long as the update size of the policy (measured by the KL divergence) is restricted, optimizing the designed objective function using historical data can guarantee the monotonic improvement of the original target. Building on the designed objective, a practical off-policy multi-agent stochastic policy gradient algorithm is proposed within the framework of centralized training with decentralized execution (CTDE). Additionally, policy entropy is integrated into the reward to promote exploration, and consequently, improve stability. Comprehensive experiments are conducted on a representative benchmark for multi-agent MuJoCo (MAMuJoCo), which offers a range of challenging tasks in cooperative continuous multi-agent control. The results demonstrate that the proposed algorithm outperforms all other existing algorithms by a significant margin.

Read full abstract

Relay-aided Device-to-Device (D2D) communication combining visible light communication (VLC) with radio frequency (RF) is a promising paradigm in the internet of things (IoT). Static relay limits the flexibility and maintaining connectivity of relays in Hybrid VLC/RF IoT systems. By using a drone as a relay station, it is possible to avoid obstacles such as buildings and to communicate in a line-of-sight (LoS) environment, which naturally aligns with the requirement of VLC Systems. To further support the application of VLC in the IoT, subject to the challenges imposed by the constrained coverage, the lack of flexibility, poor reliability, and connectivity, drone relay-aided D2D communication appears on the horizon and can be cost-effectively deployed for the large-scale IoT. This paper proposes a joint resource allocation and drones relay selection scheme, aiming to maximize the D2D system sum rate while ensuring the quality of service (QoS) requirements for cellular users (CUs) and D2D users (DUs). First, we construct a two-phase coalitional game to tackle the resource allocation problem, which exploits the combination of VLC and RF, as well as incorporates a greedy strategy. After that, a distributed cooperative multi-agent reinforcement learning (MARL) algorithm, called WoLF policy hill-climbing (WoLF-PHC), is proposed to address the drones relay selection problem. Moreover, to further reduce the computational complexity, we propose a lightweight neighbor–agent-based WoLF-PHC algorithm, which only utilizes historical information of neighboring DUs. Finally, we provide an in-depth theoretical analysis of the proposed schemes in terms of complexity and signaling overhead. Simulation results illustrate that the proposed schemes can effectively improve the system performance in terms of the sum rate and outage probability with respect to other outstanding algorithms.

Read full abstract

Multi-agent Reinforcement Learning Algorithm Research Articles

Related Topics

Articles published on Multi-agent Reinforcement Learning Algorithm

Multi-Agent Reinforcement Learning for Extended Flexible Job Shop Scheduling

Retracted: Nursing Value Analysis and Risk Assessment of Acute Gastrointestinal Bleeding Using Multiagent Reinforcement Learning Algorithm.

A Multi-Agent Reinforcement Learning Method for Omnidirectional Walking of Bipedal Robots.

Mean-field reinforcement learning for decentralized task offloading in vehicular edge computing

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks

V-Learning—A Simple, Efficient, Decentralized Algorithm for Multiagent Reinforcement Learning

Multi-Agent Multi-Target Pursuit with Dynamic Target Allocation and Actor Network Optimization

A multi-agent reinforcement learning algorithm with the action preference selection strategy for massive target cooperative search mission planning

Multi-Agent Reinforcement Learning in non-zero-sum games: Algorithms and applications

Multi-Agent Reinforcement Learning Control of a Hydrostatic Wind Turbine-Based Farm

Joint Resource Allocation and Drones Relay Selection for Large-Scale D2D Communication Underlaying Hybrid VLC/RF IoT Systems

Reinforcement Learning for Multiaircraft Autonomous Air Combat in Multisensor UCAV Platform

Graphon mean-field control for cooperative multi-agent reinforcement learning

Heuristically Assisted Multiagent RL-Based Framework for Computation Offloading and Resource Allocation of Mobile-Edge Computing

Intelligent Hierarchical Network Slicing Based on Dynamic Multi-Connectivity in Cell-Free Distributed Massive MIMO Systems

RIS Aided NR-U and Wi-Fi Coexistence in Single Cell and Multiple Cell Networks on Unlicensed Bands

Heterogeneous optimal formation control of nonlinear multi-agent systems with unknown dynamics by safe reinforcement learning

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning

Improving Sample Efficiency of Multiagent Reinforcement Learning With Nonexpert Policy for Flocking Control

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Multi-agent Reinforcement Learning Algorithm Research Articles

Related Topics

Articles published on Multi-agent Reinforcement Learning Algorithm

Multi-Agent Reinforcement Learning for Extended Flexible Job Shop Scheduling

Retracted: Nursing Value Analysis and Risk Assessment of Acute Gastrointestinal Bleeding Using Multiagent Reinforcement Learning Algorithm.

A Multi-Agent Reinforcement Learning Method for Omnidirectional Walking of Bipedal Robots.

Mean-field reinforcement learning for decentralized task offloading in vehicular edge computing

An off-policy multi-agent stochastic policy gradient algorithm for cooperative continuous control

Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks

V-Learning—A Simple, Efficient, Decentralized Algorithm for Multiagent Reinforcement Learning

Multi-Agent Multi-Target Pursuit with Dynamic Target Allocation and Actor Network Optimization

A multi-agent reinforcement learning algorithm with the action preference selection strategy for massive target cooperative search mission planning

Multi-Agent Reinforcement Learning in non-zero-sum games: Algorithms and applications

Multi-Agent Reinforcement Learning Control of a Hydrostatic Wind Turbine-Based Farm

Joint Resource Allocation and Drones Relay Selection for Large-Scale D2D Communication Underlaying Hybrid VLC/RF IoT Systems

Reinforcement Learning for Multiaircraft Autonomous Air Combat in Multisensor UCAV Platform

Graphon mean-field control for cooperative multi-agent reinforcement learning

Heuristically Assisted Multiagent RL-Based Framework for Computation Offloading and Resource Allocation of Mobile-Edge Computing

Intelligent Hierarchical Network Slicing Based on Dynamic Multi-Connectivity in Cell-Free Distributed Massive MIMO Systems

RIS Aided NR-U and Wi-Fi Coexistence in Single Cell and Multiple Cell Networks on Unlicensed Bands

Heterogeneous optimal formation control of nonlinear multi-agent systems with unknown dynamics by safe reinforcement learning

LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning

Improving Sample Efficiency of Multiagent Reinforcement Learning With Nonexpert Policy for Flocking Control