Direct policy search with extremum seeking

Ryo Hirotani,Shiro Yano,Toshiyuki Kondo,Megumi Miyashita

doi:10.23919/sice.2017.8105470

Abstract

Considering the connection between black-box optimization problem and reinforcement learning (RL) problem, we can solve a RL problem by black-box optimization algorithm. Especially, extremum seeking (ES) is a notable black-box optimization algorithm, but there exist two studies which employs ES to solve a specific RL problem. In this study, we formulate such a general RL problem as which has a stochastic non-linear state transition environment, then we propose the novel algorithm to solve it by ES. As a method, we regard an objective function as expected cumulative reward, and employ online optimization technique. In evaluation experiment, the proposed method is compared with the previous method PoWER by solving a robot arm task. From the experiment result, the proposed method is better than PoWER in adaptability. Moreover, it suggests that the previous methods are also able to solve a general RL problem.

Full Text