Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

Aydar Akhmetzyanov,Alexandr Klimchik,Salimzhan Gafurov,Rauf Yagfarov,Mikhail Ostanin

doi:10.1007/978-3-030-44267-5_25

Abstract

The reinforcement learning approach allows learning desired control policy in different environments without explicitly providing system dynamics. A model-free deep Q-learning algorithm is proven to be efficient on a large set of discrete-action tasks. Extension of this method to the continuous control task usually solved with actor-critic methods which approximate a policy function with additional actor network and uses Q function to speed up policy network training. Another approach is to discretize action space which will not give a smooth policy and is not applicable for large action spaces. A direct continuous policy derivation from the Q network leads to optimization of action on each inference and training step which is not efficient but provides optimal and continuous action. Time-efficient Q function input optimization is required in order to apply this method in practice. In this work, we implement efficient action derivation method which allows using Q-learning in real-time continuous control tasks. In addition, we test our algorithm on robotics control tasks from robotics gym environments and compare this method with modern continuous RL methods. The results have shown that in some cases proposed approach learns smooth continuous policy keeping the implementation simplicity of the original discreet action space Q-learning algorithm.

Full Text