Abstract

Random forest-based methods for 3D temporal tracking over an image sequence have gained increasing prominence in recent years. They do not require object’s texture and only use the raw depth images and previous pose as input, which makes them especially suitable for textureless objects. These methods learn a built-in occlusion handling from predetermined occlusion patterns, which are not always able to model the real case. Besides, the input of random forest is mixed with more and more outliers as the occlusion deepens. In this paper, we propose an occlusion-aware framework capable of real-time and robust 3D pose tracking from RGB-D images. To this end, the proposed framework is anchored in the random forest-based learning strategy, referred to as RFtracker. We aim to enhance its performance from two aspects: integrated local refinement of random forest on one side, and online rendering based occlusion handling on the other. In order to eliminate the inconsistency between learning and prediction of RFtracker, a local refinement step is embedded to guide random forest towards the optimal regression. Furthermore, we present an online rendering-based occlusion handling to improve the robustness against dynamic occlusion. Meanwhile, a lightweight convolutional neural network-based motion-compensated (CMC) module is designed to cope with fast motion and inevitable physical delay caused by imaging frequency and data transmission. Finally, experiments show that our proposed framework can cope better with heavily-occluded scenes than RFtracker and preserve the real-time performance.

Highlights

  • It serves as a cornerstone for numerous computer vision applications, such as augmented reality [4], robotic interaction [5] and medical navigation [6]

  • Related strategies can be roughly divided into three categories according to the data type of input: (1) RGB image-based methods [13,20]; (2) depth image-based methods [16,17]; (3) RGB-D-based methods [21,22,23]

  • In the case of 3D pose tracking from RGB images, prior shape knowledge is often the necessary input

Read more

Summary

Introduction

It serves as a cornerstone for numerous computer vision applications, such as augmented reality [4], robotic interaction [5] and medical navigation [6]. Until the advent of RGB-D sensors, early pose tracking mostly adopted a template matching-based strategy [7] or correspondences between natural landmarks [8]. Making it easy to capture 3D information of the scene, consumer RGB-D cameras break fresh ground for the more rapid development of pose tracking [9,10,11,12,13]. Pose tracking was derived by optimizing global appearance-based energy Based on this probabilistic model, an energy function using multiple local appearance models [13] was proposed to capture the spatial variation in statistical properties

Objectives
Methods
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call