Abstract

Reinforcement learning (RL) algorithms have been demonstrated to solve a variety of continuous control tasks. However, the training efficiency and performance of such methods limit further applications. In this paper, we propose an off-policy heterogeneous actor-critic (HAC) algorithm, which contains soft Q-function and ordinary Q-function. The soft Q-function encourages the exploration of a Gaussian policy, and the ordinary Q-function optimizes the mean of the Gaussian policy to improve the training efficiency. Experience replay memory is another vital component of off-policy RL methods. We propose a new sampling technique that emphasizes recently experienced transitions to boost the policy training. Besides, we integrate HAC with hindsight experience replay (HER) to deal with sparse reward tasks, which are common in the robotic manipulation domain. Finally, we evaluate our methods on a series of continuous control benchmark tasks and robotic manipulation tasks. The experimental results show that our method outperforms prior state-of-the-art methods in terms of training efficiency and performance, which validates the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call