Abstract
This research reports on the recent development of black-box optimization methods based on single-step deep reinforcement learning and their conceptual similarity to evolution strategy (ES) techniques. It formally introduces policy-based optimization (PBO), a policy-gradient-based optimization algorithm that relies on a policy network to describe the density function of its forthcoming evaluations, and uses covariance estimation to steer the policy improvement process in the right direction. The specifics of the PBO algorithm are detailed, and the connections to evolutionary strategies are discussed. Relevance is assessed by benchmarking PBO against classical ES techniques on analytic functions minimization problems, and by optimizing various parametric control laws intended for the Lorenz attractor and the classical cartpole problem. Given the scarce existing literature on the topic, this contribution definitely establishes PBO as a valid, versatile black-box optimization technique, and opens the way to multiple future improvements building on the inherent flexibility of the neural networks approach.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.