CT-Loc: Cross-domain visual localization with a channel-wise transformer

Daeho Kim,Jaeil Kim

doi:10.1016/j.neunet.2022.11.014

Abstract

We tackle the cross-domain visual localization problem of estimating camera position and orientation from real images without three-dimensional (3D) spatial mapping or modeling. Recent studies have shown suboptimal performance in this task owing to the photometric and geometric differences between synthetic and real images. In this study, we present a deep learning approach that uses a channel-wise transformer localization (CT-Loc) framework. Inspired by the human behavior of looking for structural landmarks to estimate one’s location, CT-Loc encodes the most salient features of task-relevant objects in target scenes. To evaluate the efficacy of the proposed method in a real-world application, we built a complex and large-scale dataset of the interior of the mechanical room during operations and conducted extensive performance comparisons with the publicly available state-of-the-art University of Melbourne Corridor and Virtual KITTI 2 datasets. Compared with the otherwise best-performing BIM-PoseNet indoor camera localization model, our method significantly reduces position and orientation errors through the application of attention weights and saliency maps while also learning only the visual structural patterns (e.g., floors and doors) that are most relevant to localization tasks. Our model successfully ignores uninformative objects. This approach yields higher-level robust camera-pose regression localization results without requiring prebuilt maps. The code is available at https://github.com/kdaeho27/CT-Loc.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

CT-Loc: Cross-domain visual localization with a channel-wise transformer

Abstract

Talk to us

Similar Papers

More From: Neural Networks

Lead the way for us

Journal: Neural Networks	Publication Date: Nov 19, 2022
Citations: 11

Similar Papers

Deep learning-based mobile augmented reality for task assistance using 3D spatial mapping and snapshot-based RGB-D data
Kyeong-Beom Park ... Jae Yeol Lee
Computers & Industrial Engineering | VOL. 146
Kyeong-Beom Park, et. al.Kyeong-Beom Park ... Jae Yeol Lee
10 Jun 2020
Computers & Industrial Engineering | VOL. 146

Hybrid CGH by Digitized Holography: CGH for Mixed 3D Scene of Virtual and Real Objects
Yasuaki Arima ... Kyoji Matsushima
-
Yasuaki Arima, et. al.Yasuaki Arima ... Kyoji Matsushima
01 Jan 2010
01 Jan 2010

Multiview segmented filter for multicorrelation: application to 3D face recognition
A Alfalou ... C Brosseau
-
A Alfalou, et. al.A Alfalou ... C Brosseau
17 Sep 2009
17 Sep 2009

A Virtual-Real Interaction Approach to Object Instance Segmentation in Traffic Scenes
Hui Zhang ... Yonglin Tian
IEEE Transactions on Intelligent Transportation Systems | VOL. 22
Hui Zhang, et. al.Hui Zhang ... Yonglin Tian
03 Jan 2020
IEEE Transactions on Intelligent Transportation Systems | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

CT-Loc: Cross-domain visual localization with a channel-wise transformer

Abstract

Talk to us

Similar Papers

More From: Neural Networks