A Parameter Optimization Method for Deep Reinforcement Learning by Evolution Strategy Using Multiple Higher-Ranked Individuals

Takahiro Tsuchida,Satoshi Yamaguchi

doi:10.1541/ieejeiss.140.1019

Abstract

As a parameters optimization method for neural networks which is applied to reinforcement learning, Evolution Strategy has been proposed. In this method, neural network parameters are represented by individuals, like ordinary evolutional strategies. While the evolution, a new individual is generated from some distribution that centered a parameter and is weighted according to the order of reward that the neural network corresponding to the individual obtained. However, there are cased that the differences of reward values among the higher order individuals are so few that the updating can not lead to individuals to higher quality. So, in this research, after updating the normal parameters, we select the top individuals who get high rewards and weight them, and propose a method to update the parameters again using those individuals. By focusing on individuals who get a high reward, it is expected to search for a parameter that can obtain a high score earlier than the conventional method. In the experiment, the conventional method and the proposed method are applied to BipedalWalker which is a learning environment of a 2D biped robot in OpenAI Gym, and evaluation is performed and as a result, the proposed method showed better performance than the conventional method.

Full Text