Abstract

The accurate estimation of three-dimensional (3D) object pose is important in a wide range of applications, such as robotics and augmented reality. The key to estimate object poses is matching feature points in the captured image with predefined ones of the 3D model of the object. Existing learning-based pose estimation systems utilize a voting strategy to estimate the feature points in a vector space for improving the accuracy of the estimated pose. However, the loss function of such approaches only takes account of the direction of the vector, resulting in an error-prone localization of feature points. Therefore, this paper considers a projection loss function dealing with the error of the vector field and incorporates a refinement network to revise the predicted pose to obtain a good final output. Experimental results show that the proposed methods outperform the state-of-the-art methods in ADD(-S) metric on the LINEMOD and Occlusion LINEMOD datasets. Moreover, the proposed method can be applied to real-world practical scenarios in real time to simultaneously estimate the poses of multiple objects.

Highlights

  • The main purpose of object pose estimation is to describe the relationship between the object and world coordinates

  • PROPOSED METHOD In this paper, we proposed a pose estimation system built on top of PvNet [9], taking into consideration of a projection loss and a discriminative refinement network to obtain a good performance

  • EXPERIMENTAL SETUP In this paper, we propose three pose estimation systems in different settings by considering the projection loss function, refinement network, and discriminative strategy

Read more

Summary

Introduction

The main purpose of object pose estimation is to describe the relationship between the object and world coordinates. Traditional object pose estimation methods can be roughly categorized into two-dimensional (2D) [1]–[3] and 3D [4]–[6] methods. The former detects the feature points from 2D images and solves the rotation matrix and translation matrix using Perspective-n-Point (PnP) algorithms. The latter matches the point clouds between a predefined template model and depth image information using ICP algorithms [7].

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.