Abstract

Due to the excessive degree of freedom of the space flexible manipulator, we can hardly obtain its accurate dynamic model for its motion planning. In this work, we formulate the precise motion control of the free-floating space piecewise constant curvature (FSPCC) continuum manipulator (i.e. space flexible manipulator) as a sparse reward problem in reinforcement learning, and use the Soft Actor-Critic (SAC) algorithm along with the Random Network Distillation (RND) method to train the optimal policies. Firstly, we use the RND method to jointly train a predictive network and a fixed network. The discrepancy between the output values of the two networks is served as an internal reward for the environment. Secondly, the SAC algorithm aims to maximize the expected return and the entropy of the policy. Policies with high entropy will successfully complete the task while acting as randomly as possible. Finally, the internal rewards tend to incentivize the agent to explore more widely for faster convergence of the algorithm. We applied this method to the FSPCC continuum manipulator simulation model and the results demonstrate that the SAC algorithm together with RND method can control the FSPCC continuum manipulator to catch the target quickly, even in the presence of sparse reward.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call