Policy-based optimization: single-step policy gradient method seen as an evolution strategy

J Viquerat,E Hachem,P Meliga,A Kuhnle,R Duvigneau

doi:10.1007/s00521-022-07779-0

Abstract

This research reports on the recent development of black-box optimization methods based on single-step deep reinforcement learning and their conceptual similarity to evolution strategy (ES) techniques. It formally introduces policy-based optimization (PBO), a policy-gradient-based optimization algorithm that relies on a policy network to describe the density function of its forthcoming evaluations, and uses covariance estimation to steer the policy improvement process in the right direction. The specifics of the PBO algorithm are detailed, and the connections to evolutionary strategies are discussed. Relevance is assessed by benchmarking PBO against classical ES techniques on analytic functions minimization problems, and by optimizing various parametric control laws intended for the Lorenz attractor and the classical cartpole problem. Given the scarce existing literature on the topic, this contribution definitely establishes PBO as a valid, versatile black-box optimization technique, and opens the way to multiple future improvements building on the inherent flexibility of the neural networks approach.

Full Text