HFF6D: Hierarchical Feature Fusion Network for Robust 6D Object Pose Tracking

Jian Liu,Chongpei Liu,Wei Sun,Xing Zhang,Wei Wu,Shimeng Fan

doi:10.1109/tcsvt.2022.3181597

Abstract

Tracking the 6-degree-of-freedom (6D) object pose in video sequences is gaining attention because it has a wide application in multimedia and robotic manipulation. However, current methods often perform poorly in challenging scenes, such as incorrect initial pose, sudden re-orientation, and severe occlusion. In contrast, we present a robust 6D object pose tracking method with a novel hierarchical feature fusion network, refer it as HFF6D, which aims to predict the object’s relative pose between adjacent frames. Instead of extracting features from adjacent frames separately, HFF6D establishes sufficient spatial-temporal information interaction between adjacent frames. In addition, we propose a novel subtraction feature fusion (SFF) module with attention mechanism to leverage feature subtraction during feature fusion. It explicitly highlights the feature differences between adjacent frames, thus improving the robustness of relative pose estimation in challenging scenes. Besides, we leverage data augmentation technology to make HFF6D be used more effectively in the real world by training only with synthetic data, thereby reducing manual effort in data annotation. We evaluate HFF6D on the well-known YCB-Video and YCBInEOAT datasets. Quantitative and qualitative results demonstrate that HFF6D outperforms state-of-the-art (SOTA) methods in both accuracy and efficiency. Moreover, it is also proved to achieve high-robustness tracking under the above-mentioned challenging scenes.

Full Text