Grasp detection via visual rotation object detection and point cloud spatial feature scoring

Jie Wang,Shuxiao Li

doi:10.1177/17298814211055577

Abstract

Accurately detecting the appropriate grasp configurations is the central task for the robot to grasp an object. Existing grasp detection methods usually overlook the depth image or only regard it as a two-dimensional distance image, which makes it difficult to capture the three-dimensional structural characteristics of target object. In this article, we transform the depth image to point cloud and propose a two-stage grasp detection method based on candidate grasp detection from RGB image and spatial feature rescoring from point cloud. Specifically, we first adopt the recently proposed high-performance rotation object detection method for aerial images, named R3Det, to grasp detection task, obtaining the candidate grasp boxes and their appearance scores. Then, point clouds within each candidate grasp box are normalized and evaluated to get the point cloud quality scores, which are fused with the established point cloud quantity scoring model to obtain spatial scores. Finally, appearance scores and their corresponding spatial scores are combined to output high-quality grasp detection results. The proposed method effectively fuses three types of grasp scoring modules, thus is called Score Fusion Grasp Net. Besides, we propose and adopt top-k grasp metric to effectively reflect the success rate of algorithm in actual grasp execution. Score Fusion Grasp Net obtains 98.5% image-wise accuracy and 98.1% object-wise accuracy on Cornell Grasp Dataset, which exceeds the performances of state-of-the-art methods. We also use the robotic arm to conduct physical grasp experiments on 15 kinds of household objects and 11 kinds of adversarial objects. The results show that the proposed method still has a high success rate when facing new objects.

Full Text