Natural Policy Gradient Research Articles

Abstract This paper discusses parameter-based exploration methods for reinforcement learning. Parameter-based methods perturb parameters of a general function approximator directly, rather than adding noise to the resulting actions. Parameter-based exploration unifies reinforcement learning and black-box optimization, and has several advantages over action perturbation. We review two recent parameter-exploring algorithms: Natural Evolution Strategies and Policy Gradients with Parameter-Based Exploration. Both outperform state-of-the-art algorithms in several complex high-dimensional tasks commonly found in robot control. Furthermore, we describe how a novel exploration method, State-Dependent Exploration, can modify existing algorithms to mimic exploration in parameter space.

In this paper, we suggest a novel reinforcement learning architecture, the Natural Actor-Critic. The actor updates are achieved using stochastic policy gradients employing Amari's natural gradient approach, while the critic obtains both the natural policy gradient and additional parameters of a value function simultaneously by linear regression. We show that actor improvements with natural policy gradients are particularly appealing as these are independent of coordinate frame of the chosen policy representation, and can be estimated more efficiently than regular policy gradients. The critic makes use of a special basis function parameterization motivated by the policy-gradient compatible function approximation. We show that several well-known reinforcement learning methods such as the original Actor-Critic and Bradtke's Linear Quadratic Q-Learning are in fact Natural Actor-Critic algorithms. Empirical evaluations illustrate the effectiveness of our techniques in comparison to previous methods, and also demonstrate their applicability for learning control on an anthropomorphic robot arm.

Natural Policy Gradient Research Articles

Related Topics

Articles published on Natural Policy Gradient

Exploring Parameter Space in Reinforcement Learning

Natural actor-critic with baseline adjustment for variance reduction

Natural Actor-Critic

Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Natural Policy Gradient Research Articles

Related Topics

Articles published on Natural Policy Gradient

Exploring Parameter Space in Reinforcement Learning

Natural actor-critic with baseline adjustment for variance reduction

Natural Actor-Critic

Off-Policy Natural Policy Gradient Method for a Biped Walking Using a CPG Controller