Learning from Demonstration (LfD) is an important way for robots to learn new skills. Its principle is to obtain an optimized trajectory through an instructor’s action teaching or action coding regression. By learning the teaching trajectory, the robot can master the movement skills. However, adapting the learned motor skills to meet additional constraints that arise during the task can be challenging. This paper proposed a noise exploration method PI <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BB</sup> -CMA, which improves the Policy Improvement through Black-Box Optimization (PI <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BB</sup> ) algorithm by using an adaptive covariance matrix. In our method, the teaching trajectory is collected with our designed inertial motion capture system. After then, a more reasonable trajectory is synthesized as the input to Dynamic Movement Primitives (DMP), using the Gaussian Mixture Model (GMM) and Gaussian Mixture Regression (GMR). Finally, the improved algorithm PI <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">BB</sup> -CMA is used to perform noise exploration to ensure that the robot arm can pass the designated points in the path planning. Furthermore, three baseline methods are chosen for comparison to validate the robustness of our proposed method. The experimental results demonstrated that our proposed method outperforms the baseline algorithms and is feasible for improving robot intelligence. We believe that our method has a certain potential in the aspect of skill learning for robots.
Read full abstract