Robust Action Gap Increasing with Clipped Advantage Learning

Zhe Zhang,Yaozhong Gan,Xiaoyang Tan

doi:10.1609/aaai.v36i8.20900

Abstract

Advantage Learning (AL) seeks to increase the action gap between the optimal action and its competitors, so as to improve the robustness to estimation errors. However, the method becomes problematic when the optimal action induced by the approximated value function does not agree with the true optimal action. In this paper, we present a novel method, named clipped Advantage Learning (clipped AL), to address this issue. The method is inspired by our observation that increasing the action gap blindly for all given samples while not taking their necessities into account could accumulate more errors in the performance loss bound, leading to a slow value convergence, and to avoid that, we should adjust the advantage value adaptively. We show that our simple clipped AL operator not only enjoys fast convergence guarantee but also retains proper action gaps, hence achieving a good balance between the large action gap and the fast convergence. The feasibility and effectiveness of the proposed method are verified empirically on several RL benchmarks with promising performance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust Action Gap Increasing with Clipped Advantage Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence

Lead the way for us

Journal: Proceedings of the AAAI Conference on Artificial Intelligence	Publication Date: Jun 28, 2022
Citations: 1

Similar Papers

Continuous Control With Swarm Intelligence Based Value Function Approximation
Bi Wang ... Yang Chen
IEEE Transactions on Automation Science and Engineering | VOL. 21
Bi Wang, et. al.Bi Wang ... Yang Chen
01 Jan 2024
IEEE Transactions on Automation Science and Engineering | VOL. 21

Smoothing Advantage Learning
Yaozhong Gan ... Zhe Zhang
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Yaozhong Gan, et. al.Yaozhong Gan ... Zhe Zhang
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Power system maintenance planning using value function approximation
Saranga K Abeygunawardane ... Huan Xu
-
Saranga K Abeygunawardane, et. al.Saranga K Abeygunawardane ... Huan Xu
01 Jul 2014
01 Jul 2014

Achieving Efficient and Optimal Joint Action in Distributed Cognitive Radio Networks Using Payoff Propagation
K.-L Yau ... P Komisarczuk
-
K.-L Yau, et. al.K.-L Yau ... P Komisarczuk
01 May 2010
01 May 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust Action Gap Increasing with Clipped Advantage Learning

Abstract

Talk to us

Similar Papers

More From: Proceedings of the AAAI Conference on Artificial Intelligence