Abstract

Aiming at the problem of the autonomous learning control of the seven-axis collaborative manipulator, a deep reinforcement learning control method based on delayed policy update and noise of action and policy was proposed. Firstly, the manipulator simulation environment is established, including seven axis manipulator, target, and workspace. Then the simulation environment and deep reinforcement learning network interacted through state variables. Based on two sets of reward function design with different basis, the performance of the control method based on Twin Delayed Deep Deterministic Policy Gradient (TD3) and Deep Deterministic Policy Gradient (DDPG) was compared and analysed. The simulation experiment proves that the time-related reward function has a certain versatility in deterministic gradient (DPG). Finally, it is concluded that the control method based on TD3 has better robustness and control performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.