Abstract

This paper presents a model-based approach for 3D pose estimation of a single RGB image to keep the 3D scene model up-to-date using a low-cost camera. A prelearned image model of the target scene is first reconstructed using a training RGB-D video. Next, the model is analyzed using the proposed multiple principal analysis to label the viewpoint class of each training RGB image and construct a training dataset for training a deep learning viewpoint classification neural network (DVCNN). For all training images in a viewpoint class, the DVCNN estimates their membership probabilities and defines the template of the class as the one of the highest probability. To achieve the goal of scene reconstruction in a 3D space using a camera, using the information of templates, a pose estimation algorithm follows to estimate the pose parameters and depth map of a single RGB image captured by navigating the camera to a specific viewpoint. Obviously, the pose estimation algorithm is the key to success for updating the status of the 3D scene. To compare with conventional pose estimation algorithms which use sparse features for pose estimation, our approach enhances the quality of reconstructing the 3D scene point cloud using the template-to-frame registration. Finally, we verify the ability of the established reconstruction system on publicly available benchmark datasets and compare it with the state-of-the-art pose estimation algorithms. The results indicate that our approach outperforms the compared methods in terms of the accuracy of pose estimation.

Highlights

  • To compare the performance of our approach with other state-of-the-art methods in pose estimation [39,40,52,53], five typical datasets are selected as test samples

  • The contributions of the proposed method include: (1) starting from the computation of the point cloud of an input RGB-D image, the multiple principal plane analysis (MPPA) automatically labels training image frames to prepare training datasets for training deep viewpoint classification; (2) the fusion of multiple model-specific convolutional neural neural with network (CNN) can detect multiple objects under surveillance accurately; (3) the viewpoint classifier searches the best template images for constructing a high quality image-based 3D model; (4) in the model updating phase, the usage of the template-based 3D models speeds up the process of pose estimation by using a single RGB image

  • In our experiments on publicly available datasets, we show that our approach gives the lowest overall trajectory error and outperforms the state-of-the-art methods

Read more

Summary

Introduction

In the field of three-dimensional (3D) computer vision, researchers aim at quickly reconstructing (AR), geodesy, remote sensing, 3D face recognition, drone or vehicle navigation, and 3D printing. Researchers in remote sensing provide two traditional 3D reconstruction techniques including airborne image photogrammetry [1] and light detection and ranging (LiDAR) [2]. Sci. 2019, 9, 2478 high quality 3D models; their acquisition cost is very high. With the advances of low-cost (MVS) [4,5,6], photo tourism [7], virtual reality modeling [8], and an RGB-D video-based method [9]

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.