Abstract

Deep deterministic policy gradient (DDPG) is a powerful reinforcement learning algorithm for large-scale continuous controls. DDPG runs the back-propagation from the state-action value function to the actor network's parameters directly, which raises a big challenge for the compatibility of the critic network. This compatibility emphasizes that the policy evaluation is compatible with the policy improvement. As proved in deterministic policy gradient, the compatible function guarantees the convergence ability but restricts the form of the critic network tightly. The complexities and limitations of the compatible function impede its development in DDPG. This article introduces neural networks' similarity indices with gradients to measure the compatibility concretely. Represented as kernel matrices, we consider the actor network's and the critic network's training dataset, trained parameters, and gradients. With the sketching trick, the calculation time of the similarity index decreases hugely. The centered kernel alignment index and the normalized Bures similarity index provide us with consistent compatibility scores empirically. Moreover, we demonstrate the necessity of the compatible critic network in DDPG from three aspects: 1) analyzing the policy improvement/evaluation steps; 2) conducting the theoretic analysis; and 3) showing the experimental results. Following our research, we remodel the compatible function with an energy function model, enabling it suitable to the sizeable state-action space problem. The critic network has higher compatibility scores and better performance by introducing the policy change information into the critic-network optimization process. Besides, based on our experiment observations, we propose a light-computation overestimation solution. To prove our algorithm's performance and validate the compatibility of the critic network, we compare our algorithm with six state-of-the-art algorithms using seven PyBullet robotics environments.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call