Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Zhuo Jiang,Zhihong Peng,Qingkai Yang,Daiying Tian

doi:10.1109/cac53003.2021.9727314

Abstract

This paper proposes a self-attention based temporal intrinsic reward model for reinforcement learning (RL), to synthesize the control policy for the agent constrained by the sparse reward in partially observable environments. This approach can solve the problem of temporal credit assignment to some extent and deal with the low efficiency of exploration. We first introduce a sequence-based self-attention mechanism to generate the temporary features, which can effectively capture the temporal property of the task for the agent. During the training process, the temporary features are employed in each sampled episode to elaborate the intrinsic rewards, which is combined with the extrinsic reward to help the agent learn a feasible policy. Then we use the meta-gradient to update this intrinsic reward model in order that the agent can achieve better performance. Experiments are given to demonstrate the superiority of the proposed method.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Sampling diversity driven exploration with state difference guidance
Jiayi Lu ... Junwei Zhang
Expert Systems With Applications | VOL. 203
Jiayi Lu, et. al.Jiayi Lu ... Junwei Zhang
06 May 2022
Expert Systems With Applications | VOL. 203

Author response: On the normative advantages of dopamine and striatal opponency for learning and choice
Alana Jaskir ... Michael J Frank
-
Alana Jaskir, et. al.Alana Jaskir ... Michael J Frank
14 Feb 2023
14 Feb 2023

A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning
Wenbai Chen ... Jingchen Li
International Journal of Fuzzy Systems | VOL. 23
Wenbai Chen, et. al.Wenbai Chen ... Jingchen Li
13 Feb 2021
International Journal of Fuzzy Systems | VOL. 23

Value and reward based learning in neurorobots
Jeffrey L Krichmar ... Florian Röhrbein
Frontiers in Neurorobotics | VOL. 7
Jeffrey L Krichmar, et. al.Jeffrey L Krichmar ... Florian Röhrbein
01 Jan 2013
Frontiers in Neurorobotics | VOL. 7

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Self-Attention based Temporal Intrinsic Reward for Reinforcement Learning

Abstract

Talk to us

Similar Papers