With the development of the automation industry, robotic arms, and vision applications are no longer limited to fixed actions of the past. Production lines increasingly require the recognition and grasping of objects in complex environments, emphasizing quick setup and stability. In this paper, a rapidly constructed eye-hand system for robotic arm grasping, which enables fast and efficient object manipulation, particularly for stacking, is introduced. Initially, images were captured using a camera to generate extensive datasets from a limited number of images. Objects were subsequently segmented and categorized using deep learning networks for object detection and instance segmentation. Three-dimensional position information was obtained from an RGB-D camera. Finally, object poses were determined based on plane normal vectors, and gripping positions were manually marked. This reduced the time required for grab-point identification, model training, and pose localization. Based on experimental results, the grasping procedure proposed in this paper is suitable for various object-grasping scenarios. It achieved impressive picking success rates of 96% for unstacked annular objects and 90.86% for random bin annular objects, respectively. In the final experiment, following depth information filtering, a success rate of 95.1% was attained with random bin annular object picking.
Read full abstract