Grasping objects of irregular shapes and various sizes remains a key challenge in the field of robotic grasping. This paper proposes a novel RGB-D data-based grasping pose prediction network, termed Cascaded Feature Fusion Grasping Network (CFFGN), designed for high-efficiency, lightweight, and rapid grasping pose estimation. The network employs innovative structural designs, including depth-wise separable convolutions to reduce parameters and enhance computational efficiency; convolutional block attention modules to augment the model's ability to focus on key features; multi-scale dilated convolution to expand the receptive field and capture multi-scale information; and bidirectional feature pyramid modules to achieve effective fusion and information flow of features at different levels. In tests on the Cornell dataset, our network achieved grasping pose prediction at a speed of 66.7 frames per second, with accuracy rates of 98.6% and 96.9% for image-wise and object-wise splits, respectively. The experimental results show that our method achieves high-speed processing while maintaining high accuracy. In real-world robotic grasping experiments, our method also proved to be effective, achieving an average grasping success rate of 95.6% on a robot equipped with parallel grippers.
Read full abstract