A CNN-based policy for optimizing continuous action control by learning state sequences

Tianyi Huang,Min Li,Xiaolong Qin,William Zhu

doi:10.1016/j.neucom.2021.10.004

Abstract

Continuous action control is widespread in real-world applications. It controls an agent to take action in continuous space for transiting from one state to another until achieving the desired goal. The optimization of continuous action control is an important issue, which aims to find the optimal policy for the agent to achieve the desired goal with the lowest consumption in continuous action space. A useful tool for this issue is reinforcement learning where an optimal policy is learned for the agent by maximizing the cumulative reward of the state transitions. When updating the policy at each state, most existing reinforcement learning methods consider only the one-step transition of this state. However, for each state in continuous action control, the recognizable information is usually hidden in the sequence of its previous states, thus these methods cannot learn the policy effectively enough for continuous action control. In this paper, we propose a new policy, called convolutional deterministic policy, to solve this problem. Enlightened from the convolutional neural networks used in natural language processing, our convolutional deterministic policy uses convolutional neural networks to learn the recognizable information in the state sequences. Then for each collected state, we update the convolutional deterministic policy by not only the recognizable information in the one-step transition of this state but also the recognizable information in the sequence of its previous states. As a result, our convolutional deterministic policy can make the agent take better action. Based on an effective reinforcement learning method, TD3, the implementation of our convolutional deterministic policy is in CTD3. The theoretical analysis and the experiment illustrate that our CTD3 can learn the policy not only better than but also faster than the existing RL methods for continuous action control. The source code can be downloaded from https://github.com/grcai.

Full Text