Abstract
Evolutionary Computation (EC) attracts more and more attention in Reinforcement Learning (RL) with successful applications such as robot control. Instance-Based Policy (IBP) is a promising alternative to policy representations based on Artificial Neural Networks (ANNs). The IBP has been reported superior to continuous policy representations such as ANNs in the stabilization control of non-holonomic systems due to its nature of bang-bang type control, and its understandability. A difficulty in applying an EC based policy optimization to an RL task is to choose appropriate hyper-parameters such as the network structure in ANNs and the parameters of EC. The same applies to the IBP, where the critical parameter is the number of instances that determines mode flexibility. In this paper, we propose a novel RL method combining the IBP representation and optimization by the Covariance Matrix Adaptation Evolution Strategy (CMA-ES), which is a state-of-the-art general-purpose search algorithm for black-box continuous optimization. The proposed method, called IBP-CMA, is a direct policy search that adapts the number of instances during the learning process and activates instances that do not contribute to the output. In the simulation, the IBP-CMA is compared with an ANN-based RL, CMA-TWEANN.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.