Abstract

Deep Reinforcement Learning (DRL) has been an active research area in view of its capability in solving large-scale control problems. Until presently, many algorithms have been developed, such as Deep Deterministic Policy Gradient (DDPG), Twin-Delayed Deep Deterministic Policy Gradient (TD3), and so on. However, the converging achievement of DRL often requires extensive collected data sets and training episodes, which is data inefficient and computing resource consuming. Motivated by the above problem, in this paper, we propose a Twin-Delayed Deep Deterministic Policy Gradient algorithm with a Rebirth Mechanism, Tetanic Stimulation and Amnesic Mechanisms (ATRTD3), for continuous control of a multi-DOF manipulator. In the training process of the proposed algorithm, the weighting parameters of the neural network are learned using Tetanic stimulation and Amnesia mechanism. The main contribution of this paper is that we show a biomimetic view to speed up the converging process by biochemical reactions generated by neurons in the biological brain during memory and forgetting. The effectiveness of the proposed algorithm is validated by a simulation example including the comparisons with previously developed DRL algorithms. The results indicate that our approach shows performance improvement in terms of convergence speed and precision.

Highlights

  • Deep Reinforcement Learning (DRL) is an advanced intelligent control method

  • We apply the neural network parameter updating mechanism with Tetanic stimulation and Amnesia mechanism to the DRL algorithm to further improve the efficiency in the application of manipulator

  • We propose an algorithm named ATRTD3, for continuous control of a multi-DOF manipulator

Read more

Summary

Introduction

Deep Reinforcement Learning (DRL) is an advanced intelligent control method. It uses a neural network to parameterize the Markov decision processes (MDP). There are two kinds of the DRL algorithms, one is based on value function such as Deep Q Network (DQN) [8] and Nature. The output of DRL algorithm based on value function is discrete state-action value. For continuous action space, advanced search policy can improve the sampling efficiency of the underlying algorithms [10]. Many research results focus on improving exploration policy. Bellemare et al propose an exploratory algorithm based on pseudo-counting for efficient exploration. We design a new DRL algorithm named ATRTD3 based on the research results of neuroscience and the analysis of the human brain memory learning process. We apply the neural network parameter updating mechanism with Tetanic stimulation and Amnesia mechanism to the DRL algorithm to further improve the efficiency in the application of manipulator

Related Work
Methods
ATRTD3
Tetanic Stimulation
Amnesia Mechanism
Experiment Setup
Simulation
Simulation Experimental Components
Discussion
Findings
Conclusions
28: Update target networks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call