Reducing Sampling Error in Policy Gradient Learning

Josiah P Hanna ,Peter Stone

doi:10.5555/3306127.3331798

Abstract

This paper studies a class of reinforcement learning algorithms known as policy gradient methods. Policy gradient methods optimize the performance of a policy by estimating the gradient of the expected return with respect to the policy parameters. One of the core challenges of applying policy gradient methods is obtaining an accurate estimate of this gradient. Most policy gradient methods rely on Monte Carlo sampling to estimate this gradient. When only a limited number of environment steps can be collected, Monte Carlo policy gradient estimates may suffer from sampling error -- samples receive more or less weight than they will in expectation. In this paper, we introduce the Sampling Error Corrected policy gradient estimator that corrects the inaccurate Monte Carlo weights. Our approach treats the observed data as if it were generated by a different policy than the policy that actually generated the data. It then uses importance sampling between the two -- in the process correcting the inaccurate Monte Carlo weights. Under a limiting set of assumptions we can show that this gradient estimator will have lower variance than the Monte Carlo gradient estimator. We show experimentally that our approach improves the learning speed of two policy gradient methods compared to standard Monte Carlo sampling even when the theoretical assumptions fail to hold.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Reducing Sampling Error in Policy Gradient Learning

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Deep Bayesian Quadrature Policy Optimization
Ravi Tej Akella ... Kamyar Azizzadenesheli
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35
Ravi Tej Akella, et. al.Ravi Tej Akella ... Kamyar Azizzadenesheli
18 May 2021
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 35

A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning
Junru Shi ... Qingtao Wu
Complex & Intelligent Systems | VOL. 10
Junru Shi, et. al.Junru Shi ... Qingtao Wu
12 Jul 2024
Complex & Intelligent Systems | VOL. 10

Policy Gradient Methods: Variance Reduction and Stochastic Convergence

-

01 Mar 2005
01 Mar 2005

Local Advantage Actor-Critic for Robust Multi-Agent Deep Reinforcement Learning
Yuchen Xiao ... Christopher Amato
-
Yuchen Xiao, et. al.Yuchen Xiao ... Christopher Amato
04 Nov 2021
04 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Reducing Sampling Error in Policy Gradient Learning

Abstract

Talk to us

Similar Papers