Abstract

The traditional estimation methods of space targets pose are based on artificial features to match the transformation relationship between the image and the object model. With the explosion of deep learning technology, the approach based on deep neural networks (DNN) has significantly improved the performance of pose estimation. However, the current methods still have problems such as complex calculation, low accuracy, and poor real-time performance. Therefore, a new pose estimation algorithm is proposed in this paper. Firstly, the mask image of the target is obtained by the instance segmentation algorithm, and its point cloud image is obtained based on a depth map combined with camera parameters. Finally, the correlation among points is established to realize the prediction of pose based on multi-modal feature fusion. Experimental results in the YCB-Video dataset show that the proposed algorithm can recognize complex images at a speed of about 24 images per second with an accuracy of more than 80%. In conclusion, the proposed algorithm can realize fast pose estimation for complex stacked objects and has strong stability for different objects.

Highlights

  • Targets pose estimation refers to the establishment of the transformation relationship between the world coordinate system and the camera coordinate system, which can accurately estimate the 3D position and pose of the object

  • In order to study the effect of the pose estimation network in the space robot grasping application, this paper chooses to evaluate the algorithm in the YCB-Video dataset

  • It contains 133,827 independent scenes, depicts various object instances, and has complete annotation information. This dataset is a typical dataset with wide coverage and difficult synthesis in the field of pose estimation, which is very close to an actual scene of space robot grasping

Read more

Summary

Introduction

Targets pose estimation refers to the establishment of the transformation relationship between the world coordinate system and the camera coordinate system, which can accurately estimate the 3D position and pose of the object. The traditional pose estimation method is mainly based on a local feature operator [1] This method is only suitable for objects with texture information, but it cannot do anything for weak texture objects. The common point of these methods is to improve the existing target detection algorithm and estimate the 6D pose of the object in an RGB image by calculating the offset of the object and bounding box. These algorithms extract information from RGB images separately. The lack of depth information seriously limits the application of these algorithms in complex environments [6]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call