Abstract

In this article, we propose a novel variant of path integral policy improvement with covariance matrix adaptation ( [Formula: see text] - [Formula: see text] ), which is a reinforcement learning (RL) algorithm that aims to optimize a parameterized policy for the continuous behavior of robots. [Formula: see text] - [Formula: see text] has a hyperparameter called the temperature parameter, and its value is critical for performance; however, little research has been conducted on it and the existing method still contains a tunable parameter, which can be critical to performance. Therefore, tuning by trial and error is necessary in the existing method. Moreover, we show that there is a problem setting that cannot be learned by the existing method. The proposed method solves both problems by automatically adjusting the temperature parameter for each update. We confirmed the effectiveness of the proposed method using numerical tests.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call