Abstract

Grasp detection serves as the fundamental element for achieving successful grasping in robotic systems. The encoder–decoder structure has become widely adopted as the foundational architecture for grasp detection networks due to its inherent advantages of speed and accuracy. However, traditional network structures fail to effectively extract the essential features required for accurate grasping poses and neglect to eliminate the checkerboard artifacts caused by inversion convolution during decoding. Aiming at overcoming these challenges, we propose a novel generative grasp detection network (LGAR-Net2). A transposed convolution layer is employed to replace the bilinear interpolation layer in the decoder to remove the issue of uneven overlapping and consequently eliminate checkerboard artifacts. In addition, a loss-guided collaborative attention block (LGCA), which combines attention blocks with spatial pyramid blocks to enhance the attention to important regions of the image, is constructed to enhance the accuracy of information extraction. Validated on the Cornell public dataset using RGB images as the input, LGAR-Net2 achieves an accuracy of 97.7%, an improvement of 1.1% over the baseline network, and processes a single RGB image in just 15 ms.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.