Anchor-Based Multi-Scale Deep Grasp Pose Detector With Encoded Angle Regression

Hu Cheng,Max Q.-H Meng,Yingying Wang

doi:10.1109/tase.2023.3275771

Abstract

An intelligent robot grasping system should be able to automatically grasp a variety of objects that have never been seen, which requires accurate and efficient grasp pose detection. To this end, we propose a deep grasp detector designed for the robot equipped with a parallel gripper. The deep model consumes RGB or depth data and extracts features via a feature pyramid network (FPN), followed by multiple grasp prediction units to output grasp parameters in a single stage without refining process. Attaching grasp prediction units to different FPN stages increases the model capability to predict different-size grasps. Furthermore, in each prediction unit, the grasp parameters are regressed with the horizontal anchor as a reference to overcome the challenges posed by the various shapes of the grasp regions. We improve the accuracy and efficiency of grasp rotation estimation by regressing the angle directly and encoding the angle with a continuous Gaussian-like curve during training. This encoded angle regression strategy provides distance information of different angle predictions without introducing additional computational costs. Evaluations on three datasets prove the superior performance of our method than state of the arts. The experiments in real scenarios further validate the effectiveness of our grasping system. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —This paper proposes a robot system that can automatically grasp novel objects with a parallel gripper and RGB-D camera. We focus on generating accurate grasp configurations for various objects using the captured color or depth image, which is the cornerstone of a successful grasp. To obtain effective and efficient grasp pose detection, we present a deep model that generates robust grasp poses represented by rotated bounding boxes for multiple novel objects. The first step of the grasp detector is to capture the image features through a feature pyramid network (FPN). Then, we attach separate grasp prediction units to each layer of the FPN stage and adopt anchors as references to make the model robust to variable grasp rectangle sizes. In each grasp prediction unit, two separate subnetworks are used to directly output the grasp rectangles and their probabilities, without using an extra second stage to refine the predicted grasp areas. For the prediction of rotation angle, we encode the rectangle angles with a continuous Gaussian-like curve during training to improve the prediction accuracy. Our grasp detector is trained and tested on three datasets and validated on real-scene grasp experiments. Comparisons with state-of-the-art methods show that our model is more accurate while maintaining high efficiency. The proposed grasp detection model can be applied to generate stable grasps for novel objects with different shapes, colors, and materials. Our grasping system is capable of working in multiple scenarios, including homes, factories, and warehouses.

Full Text