Abstract

In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( [Formula: see text] - [Formula: see text] ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. [Formula: see text] - [Formula: see text] has a hyperparameter called the temperature parameter, and its value is critical for performance; however, little research has been conducted on it and the existing method still contains a tunable parameter, which can be critical to performance. Therefore, tuning by trial and error is necessary in the existing method. Moreover, we show that there is a problem setting that cannot be learned by the existing method. The proposed method solves both problems by automatically adjusting the temperature parameter for each update. We confirmed the effectiveness of the proposed method using numerical tests.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.