Robot visual servoing for grasping has long been challenging to execute in complex visual environments because of issues with efficient feature extraction. This paper proposes a novel visual servoing grasping approach based on the Deep Visual Servoing Feature Network (DVSFN) to tackle this issue. The approach enables feasible to extract scale-invariant point features and target bounding boxes in real time by building an effective single-stage multi-dimensional feature extractor. The DVSFN is then integrated into a Levenberg–Marquardt–based image visual servoing (LM-IBVS) controller. The above creates a mapping link between the robot’s joint space and image features. The robot is then guided in positioning and grabbing by converting the difference between the expected and present features into the corresponding robot joint velocities. Experimental results demonstrate that the proposed method achieves a mean average precision (mAP) of 0.80 and 0.87 for detecting target bounding boxes and point features, respectively, in scenarios with significant lighting variations and occlusions. Under low-light and partial occlusion conditions, the method achieves an average grasping success rate approximately 80%.
Read full abstract