Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Shicong Cen,Yuting Wei,Yuxin Chen,Yuejie Chi,Chen Cheng

doi:10.1287/opre.2021.2151

Abstract

Preconditioning and Regularization Enable Faster Reinforcement Learning Natural policy gradient (NPG) methods, in conjunction with entropy regularization to encourage exploration, are among the most popular policy optimization algorithms in contemporary reinforcement learning. Despite the empirical success, the theoretical underpinnings for NPG methods remain severely limited. In “Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization”, Cen, Cheng, Chen, Wei, and Chi develop nonasymptotic convergence guarantees for entropy-regularized NPG methods under softmax parameterization, focusing on tabular discounted Markov decision processes. Assuming access to exact policy evaluation, the authors demonstrate that the algorithm converges linearly at an astonishing rate that is independent of the dimension of the state-action space. Moreover, the algorithm is provably stable vis-à-vis inexactness of policy evaluation. Accommodating a wide range of learning rates, this convergence result highlights the role of preconditioning and regularization in enabling fast convergence.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Abstract

Talk to us

Similar Papers

More From: Operations Research

Lead the way for us

Journal: Operations Research	Publication Date: Dec 2, 2021
Citations: 25

Similar Papers

Geometry and convergence of natural policy gradient methods
Johannes Müller ... Guido Montúfar
Information Geometry | VOL. 7
Johannes Müller, et. al.Johannes Müller ... Guido Montúfar
02 Jun 2023
Information Geometry | VOL. 7

Implicit incremental natural actor critic algorithm
Ryo Iwaki ... Minoru Asada
Neural Networks | VOL. 109
Ryo Iwaki, et. al.Ryo Iwaki ... Minoru Asada
21 Oct 2018
Neural Networks | VOL. 109

Implicit Incremental Natural Actor Critic
Ryo Iwaki ... Minoru Asada
-
Ryo Iwaki, et. al.Ryo Iwaki ... Minoru Asada
01 Jan 2017
01 Jan 2017

From Robots to Reinforcement Learning
Tongchun Du ... Don Perlis
-
Tongchun Du, et. al.Tongchun Du ... Don Perlis
01 Nov 2013
01 Nov 2013

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Abstract

Talk to us

Similar Papers

More From: Operations Research