Abstract

Nowadays robots play an increasingly important role in our daily life. In human-centered environments, robots often encounter piles of objects, packed items, or isolated objects. Therefore, a robot must be able to grasp and manipulate different objects in various situations to help humans with daily tasks. In this paper, we propose a multi-view deep learning approach to handle robust object grasping in human-centric domains. In particular, our approach takes a point cloud of an arbitrary object as an input, and then, generates orthographic views of the given object. The obtained views are finally used to estimate pixel-wise grasp synthesis for each object. We train the model end-to-end using a synthetic object grasp dataset and test it on both simulation and real-world data without any further fine-tuning. To evaluate the performance of the proposed approach, we performed extensive sets of experiments in four everyday scenarios, including isolated objects, packed items, pile of objects, and highly cluttered scenes. Experimental results show that our approach performed very well in all simulation and real-robot scenarios. More specifically, the proposed approach outperforms previous state-of-the-art approaches and achieves a success rate of >90% in all simulated and real scenarios, except for the pile of objects which is 82%. Additionally, our method demonstrated reliable closed-loop grasping of novel objects in a variety of scene configurations. The video of our experiments can be found here: https://youtu.be/c-4lzjbF7fY.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call