Abstract

Tracking an object’s 6D pose is significant in various real-world applications, such as robotic grasping, virtual reality, and self-driving. Still the existing methods suffer from following challenges: i) geometric shapes, such as rotational symmetry shapes, and ii) complex scenes, for instance, cluttered backgrounds and occluded scenes. To tackle these problems, we propose FCR-TrackNet, a novel tracking network that utilizes a residual iterative framework with low-and high-level feature fusion and joint classification-regression. FCR-TrackNet inputs the rendered RGB-D image from the previous frame pose and the current RGB-D image to extract low-level features. Then, high-level features are also obtained by the convolutional network and fused with low-level one to capture subtle variations in target object features across adjacent frames. To reduce computational complexity and ensure high tracking speed, we adopt decoupled branches to estimate the translation and rotation of the pose independently. At last, a joint classification-regression is designed to address the boundary problem of the rotation angle. We introduce a smooth classification label that effectively enhances the accuracy of rotation vector classification. We evaluate the performance of FCR-TrackNet on two well-known datasets, YCB-Video and YCBInEOAT. FCR-TrackNet achieves state-of-the-art ADD values of 94.5% and 93.2%, and ADD-S values of 96.7% and 96.0%, respectively, with a tracking speed of 89.6 Hz. It also outperforms competing algorithms in target pose tracking in occlusion and rotational symmetry shapes. The quantitative and qualitative results validate the high performance of FCR-TrackNet on 6D pose tracking.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call