An Improved DDPG and Its Application Based on the Double-Layer BP Neural Network

Mingli Zhang,Xiaolong He,Zhengjie Gao,Yijie Zhang

doi:10.1109/access.2020.3020590

Abstract

This paper focused on three application problems of the traditional Deep Deterministic Policy Gradient(DDPG) algorithm. That is, the agent exploration is insufficient, the neural network performance is unsatisfied, the agent output fluctuates greatly. In terms of agent exploration strategy, network training algorithm and overall algorithm implementation, an improved DDPG method based on double-layer BP neural network is proposed. This method introduces fuzzy algorithm and BFGS algorithm based on Armijo-Goldstein criterion, improves the exploration efficiency, learning efficiency and convergence of BP neural network, increases the number of layers of BP neural network to improve the fitting ability of the network, and adopts periodic update to ensure the stable operation of the algorithm. The experimental results show that the deep learning network based on the improved DDPG algorithm has greatly improved the performance compared with the traditional method after multiple rounds of self-learning under variable working conditions. This study lays a theoretical and experimental foundation for the extended application of deep learning algorithm.

Highlights

The overall advantages of the legged robot lies in that its has good motion performance in unstructured environment, strong terrain adaptability and load capacity, and have broad application prospects in complex working environments such as security patrol inspection, field transportation, fire rescue, and geological exploration [1]–[4]
The simulation results show that the control method based on improved Deep deterministic policy gradient (DDPG) can significantly improve the control effect through exploration and learning without prior knowledge, and achieve the performance close to the variable value PID method
The overall control effect is similar to the variable value PID, which can achieve high-precision control effect under the condition of avoiding downtime and debugging, and has a good selfadaptability. It can be seen from the training results of the three working conditions that the improved DDPG control method is greatly improved compared with that before the improvement, and the algorithm becomes stable, so that it can be safely applied in the actual system

Summary

INTRODUCTION

The overall advantages of the legged robot lies in that its has good motion performance in unstructured environment, strong terrain adaptability and load capacity, and have broad application prospects in complex working environments such as security patrol inspection, field transportation, fire rescue, and geological exploration [1]–[4]. It can update like DQN single step, and has the high rate utilization of deterministic policy gradient data, the convergence of good advantages It is suitable for the complex control system based on the continuous state and action Spaces, can realize effective information iteration and self-improve to achieve better control performance in the process of interaction with the system, in no model and prior knowledge of human cases, It is a true sense of the intelligent control method. An improved DDPG algorithm based on double-layer BP neural network is proposed and applied to HDU position control system. In the traditional DDPG method, the action strategy to be optimized by an agent is represented by a deep neural network, with the network input is the system state and the output is the corresponding action In this way, a continuous nonlinear action space can be learned to meet the requirements of complex tasks such as multi-joint robot. Whether this method has good performance in other systems remains to be done by future work

SIMULATION ON HDU POSITION CONTROL BASED ON IMPROVED DDPG ALGORITHM

Findings

CONCLUSION