Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent Learning

Meng Xu,Jianping Wang,Yang Jin,Yechao She

doi:10.1145/3623405

Abstract

In Deep Reinforcement Learning (DRL) domain, a compound learning task is often decomposed into several sub-tasks in a divide-and-conquer manner, each trained separately and then fused concurrently to achieve the original task, referred to as policy fusion. However, the state-of-the-art (SOTA) policy fusion methods treat the importance of sub-tasks equally throughout the task process, eliminating the possibility of the agent relying on different sub-tasks at various stages. To address this limitation, we propose a generic policy fusion approach, referred to as Policy Fusion Learning with Dynamic Weights and Prior Reward (PFLDWPR), to automate the time-varying selection of sub-tasks. Specifically, PFLDWPR produces a time-varying one-hot vector for sub-tasks to dynamically select a suitable sub-task and mask the rest throughout the entire task process, enabling the fused strategy to optimally guide the agent in executing the compound task. The sub-tasks with the dynamic one-hot vector are then aggregated to obtain the action policy for the original task. Moreover, we collect sub-tasks’s rewards at the pre-training stage as a prior reward, which, along with the current reward, is used to train the policy fusion network. Thus, this approach reduces fusion bias by leveraging prior experience. Experimental results under three popular learning tasks demonstrate that the proposed method significantly improves three SOTA policy fusion methods in terms of task duration, episode reward, and score difference.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent Learning

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology

Lead the way for us

Journal: ACM Transactions on Intelligent Systems and Technology	Publication Date: Nov 14, 2023
Citations: 1

Similar Papers

Tracking the Race Between Deep Reinforcement Learning and Imitation Learning
Timo P Gros ... Daniel Höller
-
Timo P Gros, et. al.Timo P Gros ... Daniel Höller
01 Jan 2020
01 Jan 2020

Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences
Chayoung Kim
Symmetry | VOL. 12
Chayoung KimChayoung Kim
14 Oct 2020
Symmetry | VOL. 12

Traffic3D: A Rich 3D-Traffic Environment to Train Intelligent Agents
Deepeka Garg ... Maria Chli
-
Deepeka Garg, et. al.Deepeka Garg ... Maria Chli
01 Jan 2019
01 Jan 2019

A Survey of Deep Reinforcement Learning Based on Multi-Particle Environments
Chuhao Weng
Highlights in Science, Engineering and Technology | VOL. 85
Chuhao WengChuhao Weng
13 Mar 2024
Highlights in Science, Engineering and Technology | VOL. 85

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Dynamic Weights and Prior Reward in Policy Fusion for Compound Agent Learning

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Intelligent Systems and Technology