Abstract

This paper presents a feature agnostic and model-free visual servoing (VS) technique using deep reinforcement learning (DRL) which exploits two new architectures of experience replay buffer in deep deterministic policy gradient (DDPG). The proposed architectures are significantly fast and converge in a few numbers of steps. We use the proposed method to learn an end-to-end VS with eye-in-hand configuration. In traditional DDPG, the experience replay memory is randomly sampled for training the actor-critic network. This results in a loss of useful experiences when the buffer contains very few successful examples. We solve this problem by proposing two new replay buffer architectures: (a) min-heap DDPG (mH-DDPG) and (b) dual replay buffer DDPG (dR-DDPG). The former uses a min-heap data structure to implement the replay buffer whereas the latter uses two buffers to separate “good” examples from the “bad” examples. The training data for the actor-critic network is created as a weighted combination of the two buffers. The proposed algorithms are validated in simulation with the UR5 robotic manipulator model. It is observed that as the number of good experiences increases in the training data, the convergence time decreases. We find 27.25% and 43.25% improvements in the rate of convergence respectively by mH-DDPG and dR-DDPG over state-of-the-art DDPG.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call