Abstract

Recently, the deep reinforcement learning method based on actor-critic has a competent performance in continuous action control tasks, such as the proposed deep deterministic policy gradient (DDPG) algorithm. However, this algorithm also has its shortcomings in applications. For example, the actor network and the critic network are closely dependent, and the critic network is always prone to overestimation. Those may lead to poor updates of the policy. Different from the conventional double networks taking the smallest operation, different target actor networks are applied to generate actions when the critic network is updated. Considering the structure of the double networks, a better action is preferred, which is conducive to speeding up the network convergence. In this paper, the Double-Net DDPG algorithm with double actor networks and double critic networks is proposed, with the optimal action selection mechanism. It reduces the network dependency in DDPG. The result shows the proposed algorithm has improved the original algorithm and achieves excellent performance in the continuous action control task.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call