Deep reinforcement learning has significantly advanced robot manipulations by providing an alternative solution for designing control strategies using raw images as direct inputs. While images offer additional environmental information, the end-to-end policy training manner (from image to action) requires simultaneous representation and task learning by the agent. This often necessitates a substantial number of interaction samples to achieve satisfactory policy performance. Previous works has attempted to address this challenge by learning a visual representation model that encodes the entire image into a low-dimensional vector before the policy training. However, since this vector contains both robot and object information, it inevitably introduces coupling within the state, which can mislead the policy training process. In this study, a novel method called Reinforcement Learning with Decoupled State Representation is proposed to effectively decouple robot and object information within the state representation. Experimental results demonstrate that the proposed method exhibits faster learning speed and achieves superior performance compared to previous methods across various robot manipulation tasks. Moreover, with only 3096 offline images, the proposed method successfully applies to real-world robot pushing tasks, which demonstrates its high practicability.
Read full abstract