Abstract

Accurate and efficient object pose estimation holds the indispensable part of virtual/augmented reality (VR/AR) and many other applications. While previous works focus on directly regressing 6D pose from RGB and depth image and thus suffer from the non-linearity of rotation space, we propose an iterative 3D keypoints voting network, named as KVNet. Specifically, our method decouples the pose into separate translation and rotation branch, both estimated by Hough voting scheme. By treating the uncertainty of keypoints’ vote as the Lipschitz continuous function of seed points’ fused embedding feature, our method is able to adaptively select the optimal keypoints vote. In this way, we argue that KVNet bridges the gap between the non-linear rotation space and linear Euclidean space, which introduces inductive bias for our network to learn the intrinsic pattern and infer 6D pose from RGB and depth images. Furthermore, our model will refine the initial keypoints localization with iterative fashion. Experiments show that across three challenging benchmark datasets (LineMOD, YCB-Video and Occlusion LineMOD), our method exhibits excellent performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call