Abstract

Deep Reinforcement Learning (DRL), one of the most popular research topics in artificial intelligence, has achieved a breakthrough in continuous control tasks. Nonetheless, the DRL algorithm's instability and local optimality have a bad influence impact on its performance. The Deep Deterministic Policy Gradients (DDPG) algorithm uses a "soft" update to slow down the target value rate of change to alleviate this problem. However, there is still a specific target approximate error variance. The variance will aggravate the degree of the data dispersion and reduce the stability of the model. This paper proposed the DDPG with averaged state-action estimation (Averaged-DDPG) algorithm. It aims to minimize the adverse effects of conflict, which calculates the action reward by averaging the estimated values of previously learned Q values, thus reducing the training process's fluctuation and improving the algorithm's performance. The evaluation results in continuous control tasks show that Averaged-DDPG can enhance the agent's learning efficiency and training balance more effectively than the original DDPG algorithm.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.