Abstract

Deep Reinforcement Learning (DRL), one of the most popular research topics in artificial intelligence, has achieved a breakthrough in continuous control tasks. Nonetheless, the DRL algorithm's instability and local optimality have a bad influence impact on its performance. The Deep Deterministic Policy Gradients (DDPG) algorithm uses a "soft" update to slow down the target value rate of change to alleviate this problem. However, there is still a specific target approximate error variance. The variance will aggravate the degree of the data dispersion and reduce the stability of the model. This paper proposed the DDPG with averaged state-action estimation (Averaged-DDPG) algorithm. It aims to minimize the adverse effects of conflict, which calculates the action reward by averaging the estimated values of previously learned Q values, thus reducing the training process's fluctuation and improving the algorithm's performance. The evaluation results in continuous control tasks show that Averaged-DDPG can enhance the agent's learning efficiency and training balance more effectively than the original DDPG algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call