Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

Zaipeng Xie,Wenzhan Song,Chentai Qiao,Yufeng Zhang,Yujing Zhang,Zewen Li,Cheng Ji

doi:10.1049/cit2.12322

Abstract

AbstractMulti‐agent reinforcement learning relies on reward signals to guide the policy networks of individual agents. However, in high‐dimensional continuous spaces, the non‐stationary environment can provide outdated experiences that hinder convergence, resulting in ineffective training performance for multi‐agent systems. To tackle this issue, a novel reinforcement learning scheme, Mutual Information Oriented Deep Skill Chaining (MioDSC), is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency. These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state. In addition, MioDSC can generate cooperative policies using the options framework, allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi‐agent learning. MioDSC was evaluated in the multi‐agent particle environment and the StarCraft multi‐agent challenge at varying difficulty levels. The experimental results demonstrate that MioDSC outperforms state‐of‐the‐art methods and is robust across various multi‐agent system tasks with high stability.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

Abstract

Talk to us

Similar Papers

More From: CAAI Transactions on Intelligence Technology

Lead the way for us

Journal: CAAI Transactions on Intelligence Technology	Publication Date: Mar 28, 2024
License type: CC BY-NC 4.0

Similar Papers

Deep Skill Chaining with Diversity for Multi-agent Systems*
Zaipeng Xie ... Cheng Ji
-
Zaipeng Xie, et. al.Zaipeng Xie ... Cheng Ji
01 Jan 2021
01 Jan 2021

Control of a bioreactor using a new partially supervised reinforcement learning algorithm
B Jaganatha Pandian ... Mathew Mithra Noel
Journal of Process Control | VOL. 69
B Jaganatha Pandian, et. al.B Jaganatha Pandian ... Mathew Mithra Noel
24 Jul 2018
Journal of Process Control | VOL. 69

Learning in continuous action space for developing high dimensional potential energy models
Sukriti Manna ... Bilvin Varughese
Nature Communications | VOL. 13
Sukriti Manna, et. al.Sukriti Manna ... Bilvin Varughese
18 Jan 2022
Nature Communications | VOL. 13

Reinforcement-learning-assisted quantum optimization
Matteo M Wauters ... Emanuele Panizon
Physical Review Research | VOL. 2
Matteo M Wauters, et. al.Matteo M Wauters ... Emanuele Panizon
18 Sep 2020
Physical Review Research | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Mutual information oriented deep skill chaining for multi‐agent reinforcement learning

Abstract

Talk to us

Similar Papers

More From: CAAI Transactions on Intelligence Technology