Obtaining a highly accurate dynamic model from video data is a challenging task within system Identification. More specifically, in robotics, there is a constant need for precise models to optimize the performance and safety of these systems. With the developments in machine learning, algorithms such as convolutional neural networks (CNNs) have shown good performances in extracting features from images. Techniques like transfer learning (TL) can lower the computational cost and optimize the training task. This work aims to make the gray-box Identification of friction parameters of an actuator from video. The proposed approach is composed of three steps. Firstly, it compares the transfer learning performance of three pre-trained CNNs to estimate the pose of the motor and the link from video data. Secondly, a long-term short-term (LSTM) network is added to estimate the velocity and acceleration. Thirdly, the parameters of the two friction models are estimated using the estimated states and the dynamic equations of the system. Through these steps, a vision-based gray-box Identification is obtained. The approach is tested using an original elastomer-based series of elastic actuators (eSEA) benchmark. The results show that the VGG19 model combined with the LSTM layer is the optimal pre-trained CNN for the vision module, obtaining a coefficient of determination higher than 0.96 for all three states. From the state estimates, the parameters of the friction models are optimized. The best-performing friction model is the Coulomb-Stribeck, which decreases the mean absolute error (MAE) by 15.50% compared to the viscous linear model.
Read full abstract