Abstract

As a parameters optimization method for neural networks which is applied to reinforcement learning, Evolution Strategy has been proposed. In this method, neural network parameters are represented by individuals, like ordinary evolutional strategies. While the evolution, a new individual is generated from some distribution that centered a parameter and is weighted according to the order of reward that the neural network corresponding to the individual obtained. However, there are cased that the differences of reward values among the higher order individuals are so few that the updating can not lead to individuals to higher quality. So, in this research, after updating the normal parameters, we select the top individuals who get high rewards and weight them, and propose a method to update the parameters again using those individuals. By focusing on individuals who get a high reward, it is expected to search for a parameter that can obtain a high score earlier than the conventional method. In the experiment, the conventional method and the proposed method are applied to BipedalWalker which is a learning environment of a 2D biped robot in OpenAI Gym, and evaluation is performed and as a result, the proposed method showed better performance than the conventional method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.