Abstract

In this work, we present a novel method to learn a local cross-domain descriptor for 2D image and 3D point cloud matching. Our proposed method is a dual auto-encoder neural network that maps 2D and 3D input into a shared latent space representation. We show that such local cross-domain descriptors in the shared embedding are more discriminative than those obtained from individual training in 2D and 3D domains. To facilitate the training process, we built a new dataset by collecting ≈ 1.4 millions of 2D-3D correspondences with various lighting conditions and settings from publicly available RGB-D scenes. Our descriptor is evaluated in three main experiments: 2D-3D matching, cross-domain retrieval, and sparse-to-dense depth estimation. Experimental results confirm the robustness of our approach as well as its competitive performance not only in solving cross-domain tasks but also in being able to generalize to solve sole 2D and 3D tasks. Our dataset and code are released publicly at https://hkust-vgd.github.io/lcd.

Highlights

  • Computer vision tasks such as structure-from-motion, visual content retrieval require robust descriptors from both 2D and 3D domains

  • We evaluate the performance of our 2D descriptor on the task of image matching

  • The dense depth prediction is calculated by projecting the dense 3D point cloud back to the image plane

Read more

Summary

Introduction

Computer vision tasks such as structure-from-motion, visual content retrieval require robust descriptors from both 2D and 3D domains. Such descriptors, in their own domain, can be constructed from low-level features, e.g., colors, edges, etc. With the advent of deep learning, many robust 2D descriptors are learned automatically using deep neural networks (Simo-Serra et al 2015; Kumar et al 2016). These learned descriptors have shown their robustness and advantages over the hand-crafted counterparts. Hand-crafted 3D descriptors, e.g., FPFH (Rusu, Blodow, and Beetz 2009), SHOT (Tombari, Salti, and Di Stefano 2010), as well as deep learning based descriptors (Zeng et al 2017) have been used in many 3D tasks, such as 3D registrations (Choi, Zhou, and Koltun 2015; Zhou, Park, and Koltun 2016) and structure-from-motion (Hartley and Zisserman 2003)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.