Abstract

Deep Reinforcement Learning (DRL), which can learn complex policies with high-dimensional observations as inputs, e.g., images, has been successfully applied to various tasks. Therefore, it may be suitable to apply them for robots to learn and perform daily activities like washing and folding clothes, cooking, and cleaning since such tasks are difficult for non-DRL methods that often require either (1) direct access to state variables or (2) well-designed hand-engineered features extracted from sensory inputs. However, applying DRL to real robots remains very challenging because conventional DRL algorithms require a huge number of training samples for learning, which is arduous in real robots. To alleviate this dilemma, in this paper, we propose two sample efficient DRL algorithms: Deep P-Network (DPN) and Dueling Deep P-Network (DDPN). The core idea is to combine the nature of smooth policy update with the capability of automatic feature extraction in deep neural networks to enhance the sample efficiency and learning stability with fewer samples. The proposed methods were first investigated by a robot-arm reaching task in the simulation that compared previous DRL methods and applied to two real robotic cloth manipulation tasks: (1) flipping a handkerchief and (2) folding a t-shirt with a limited number of samples. All the results suggest that our method outperformed the previous DRL methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call