Robust LiDAR-Camera Alignment With Modality Adapted Local-to-Global Representation

Angfan Zhu,Chengxin Liu,Zhiguo Cao,Yang Xiao

doi:10.1109/tcsvt.2022.3197212

Abstract

LiDAR-Camera alignment (LCA) is an important preprocessing procedure for fusing LiDAR and camera data. For it, one key issue is to extract unified cross-modality representation for characterizing the heterogeneous LiDAR and camera data effectively and robustly. The main challenge is to resist the modality gap and visual data degradation during feature learning, while still maintaining strong representative power. To address this, a novel modality adapted local-to-global representation learning method is proposed. The research efforts are paid in 2 main folders via modality adaptation and capturing global spatial context. First for modality gap resistance, LiDAR and camera data is projected into the same depth map domain for unified representation learning. Particularly, LiDAR data is converted to depth map according to pre-acquired extrinsic parameters. Thanks to the recent advantage of deep learning based monocular depth estimation, camera data is transformed into depth map in data driven manner, which is jointly optimized with LCA. Secondly to capture global spatial context, ViT (vision transformer) is introduced to LCA. The concept of LCA token is proposed for aggregating the local spatial patterns to form global spatial representation with transformer encoding. And, it is shared by all the samples. In this way, it can involve global sample-level information to leverage generalization ability. The experiments on KITTI dataset verify superiority of our proposition. Furthermore, the proposed approach is more robust to camera data degeneration (e.g., imaging blurring and noise) often faced by the practical applications. Under some challenging test cases, the performance advancement of our method is over <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1.9~cm$ </tex-math></inline-formula> /4.1° on translation / rotation error. While our model size (8.77M) is much smaller than existing methods (e.g., LCCNet of 66.75M). The source code will be released at <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Zaf233/RLCA</uri> upon acceptance.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Robust LiDAR-Camera Alignment With Modality Adapted Local-to-Global Representation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology

Lead the way for us

Journal: IEEE Transactions on Circuits and Systems for Video Technology	Publication Date: Jan 1, 2023
Citations: 6

Similar Papers

LIDAR and vision-based real-time traffic sign detection and recognition algorithm for intelligent vehicle
Lipu Zhou ... Zhidong Deng
-
Lipu Zhou, et. al.Lipu Zhou ... Zhidong Deng
01 Oct 2014
01 Oct 2014

Multimodal Deep-Learning for Object Recognition Combining Camera and LIDAR Data
Gledson Melotti ... Cristiano Premebida
-
Gledson Melotti, et. al.Gledson Melotti ... Cristiano Premebida
01 Apr 2020
01 Apr 2020

A Lightweight One-Stage 3D Object Detector Based on LiDAR and Camera Sensors
Li-Hua Wen ... Kang-Hyun Jo
-
Li-Hua Wen, et. al.Li-Hua Wen ... Kang-Hyun Jo
20 Jun 2021
20 Jun 2021

Object Detection based on Fusing Monocular Camera and Lidar Data in Decision Level Using D-S Evidence Theory
Qingzhu Wu ... Biao Hu
-
Qingzhu Wu, et. al.Qingzhu Wu ... Biao Hu
01 Aug 2020
01 Aug 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Robust LiDAR-Camera Alignment With Modality Adapted Local-to-Global Representation

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Circuits and Systems for Video Technology