FANet: Fast and Accurate Robotic Grasp Detection Based on Keypoints

Di-Hua Zhai,Sheng Yu,Yuanqing Xia

doi:10.1109/tase.2023.3272664

Abstract

In practice, the real-time and accuracy of robotic grasp detection are two very important metrics. In the past, researchers had to sacrifice the real-time nature of the detection network in order to obtain higher detection accuracy. How to make the real-time and accuracy of the network co-exist is a problem worth studying. In order to solve this problem, this paper proposes a network, FANet, based on grasp keypoints, which improves the accuracy of grasp detection while ensuring the real-time performance. The key of this paper is how to quickly and accurately detect grasped keypoints. To this end, this paper proposes a local refinement module that optimizes and de-duplicates each feature of the multi-scale feature map, enabling the network to make full use of the multi-scale features. We also propose a global feature refinement module that allows the network to make better use of global features. We also propose a grasp keypoint optimization module that predicts the offset between the actual keypoints and the predicted keypoints, enabling the network to predict the keypoints more accurately. Moreover, we develop two FANets specifically for grasp detection on CPU and GPU, both of which can accomplish real-time grasp detection in real-world scenes. We complete the training and testing of FANet on the Cornell dataset and the Jacquard dataset, achieving SOTA results on the Jacquard dataset. We also test FANet on a dataset of unknown objects, all with good results. Finally, we use the FANet in grasping experiments with an actual Baxter robot and achieve an average grasping success rate of 96%. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —Real-time and accuracy are two very important metrics in robotic grasping detection. To achieve high accuracy, more time is often consumed for feature extraction. Similarly, in order to improve the real time performance, we need to reduce the time consumed in the feature extraction process, which may result in a drop in detection accuracy. How to coordinate the relationship between them, so as to have both, is a problem worth investigating. Current methods tend to focus on obtaining higher accuracy and they are willing to spend more time to achieve higher accuracy. But in some practical scenarios, such as on factory assembly lines, objects move fast, and the network needs to be able to detect the grasping position quickly, the real-time performance is more important, which makes some methods difficult to use. In addition, most of the current methods tend to focus on GPU-based robotic grasp detection methods, and in real-world scenarios we may not have such a powerful processing GPU available. In contrast, the CPU is an indispensable unit of the computer that we can use to process images without a high-performance GPU. However, compared to GPUs, the CPUs’ image processing capability is poor, making it difficult to achieve real-time processing. Faced with this situation, the problem of how to achieve real-time and high accuracy in a CPU-only robotic grasp detection network is worth studying, but most of the existing methods ignore this problem. To address these problems, we propose a Fast and Accurate robotic grasp detection Network (FANet), which not only enables the network to combine real-time and accuracy, but also enables real-time detection on CPU or GPU.

Full Text