• A new CNN architecture to recognize manipulation relationships. • A new dataset for visual manipulation relationship (VMR) recognition. • The first end-to-end CNN for VMR recognition based on RGB images. • (a) In real-world robotic grasping, visual manipulation relationships (VMR) are important for plan making to prevent potential damages (e.g. when neglecting VMRs, if the robot grasps the book first, the cup upon it will fall over and may be broken). • (b) We present a new convolutional neural network (CNN) architecture called Visual Manipulation Relationship Network (VMRN) to help robots detect targets and recognize the VMRs in real time. Object manipulation in object-stacking scenes is a significant but challenging skill for intelligent robots . In most cases, the relationships among objects should be considered before manipulation to prevent chaos and damages. However, the analysis of object relationships in object-stacking scenes, especially for robotic manipulation, remains to be unsolved. To this end, this paper presents a new convolutional neural network (CNN) architecture, called Visual Manipulation Relationship Network (VMRN), to recognize the visual manipulation relationships (VMR) between objects in real-time. By considering the manipulation relationships in object-stacking scenes, it ensures that the robot can complete manipulation tasks safely and reliably. The core of our model is the Object Pairing Pooling Layer (OP 2 L), which makes it possible to recognize objects and all possible VMRs in one forward process. Moreover, to train VMRN, we contribute a dataset named Visual Manipulation Relationship Dataset (VMRD) consisting of 4683 images with more than 16,000 object instances and the VMRs between each object pair. The experimental results show that the proposed network architecture can detect objects and predict VMRs.
Read full abstract