Abstract

End-to-end visual servoing based on reinforcement learning(RL) can simplify the design of features and control laws and has strong scalability in combination with neural networks. However, it is challenging for RL-based VS tasks to operate in a continuous state or action space due to the difficulty in space exploration and slow training convergence. Hence, this paper presents a novel measurement method based on centralized features extracted by a neural network, which calculates the novelty of the visited state to encourage RL-agent exploration. Moreover, we propose a hybrid probability sampling method that improves the Prioritized Experience Replay (PER) based on a Temporal-Difference (TD) error by integrating the intrinsic and external rewards. This strategy represents the transition novelty and quality in the buffer replay, respectively, to promote convergence in the training process. Finally, we develop an end-to-end VS scheme based on the maximum entropy RL Soft Actor-Critic (SAC). Several simulated experiments in CoppeliaSim are designed for end-to-end VS, where the target detection information is the agent’s input. The results highlight that our method’s reward value and completion rates are 0.35 and 8.0% higher than the SAC VS baseline. At the same time, we conduct experiments to verify the effectiveness of the proposed algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call