Abstract
Accurate pose estimation of planar objects is a key computation in visual localization tasks, with recent studies showing remarkable progress on a handful of baseline datasets. Nonetheless, achieving similar performance on sequences in unconstrained environments is still an ongoing quest to be accomplished, largely due to the existence of several sources of errors, which are correlated but often only partly tackled in the literature. In this article, we propose POP, a generic real-time planar-object pose-estimation framework which is designed to handle the aforementioned types of errors while not losing generality to a specific choice of keypoint detection or tracking algorithm. The essence of POP lies in activating keypoint detection module in the background as well as adding several refinement steps in order to reduce correlated sources of errors within the pipeline. We provide extensive experimental evaluations against state-of-the-art planar object tracking algorithms on baseline and more challenging datasets, empirically demonstrating the effectiveness of the POP framework for scenes with large environmental variations.
Highlights
Object pose estimation is central to many applications in computer vision and robotics, namely surveillance, robot manipulation and augmented reality (AR) [11], [44]
This work is limited to using off-the-shelf tracking modules, and errors generated within the tracking process are not refined. We believe this is a pioneering work for considering multiple sources of errors in a single pipeline, and it is hoped that this will provide a useful direction for future research in reducing correlated errors for planar object tracking
We compared the tracking accuracies achieved by different combinations of feature detectors and descriptors when using planar object pose-estimation (POP) to demonstrate the generic nature of the framework
Summary
Object pose estimation is central to many applications in computer vision and robotics, namely surveillance, robot manipulation and augmented reality (AR) [11], [44]. Major baseline and state-of-the-art methods adopt this framework albeit accompanied by various modifications, with recent work [25], [44] achieving near 80–99% average tracking accuracies on several public datasets Such positive results do not always replicate to unconstrained scenes encompassing large environmental variations [44], leaving it as one of the remaining challenges for this type of method. State-of-the-art results for planar objects are still achieved by more traditional modular pipelines [44] primarily due to lack of training data available for end-to-end platforms, which require large variations in backgrounds and viewpoints to learn the correct representation for planar objects [38] This is the major bottleneck behind the current deep learning-based methods, especially in AR where scenes can exhibit large environmental changes. We believe this is a pioneering work for considering multiple sources of errors in a single pipeline, and it is hoped that this will provide a useful direction for future research in reducing correlated errors for planar object tracking
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.