Long-Short Range Adaptive Transformer With Dynamic Sampling for 3D Object Detection

Chuxin Wang,Yongdong Zhang,Zhe Zhang,Jianfeng He,Jiacheng Deng,Tianzhu Zhang

doi:10.1109/tcsvt.2023.3272734

Abstract

3D object detection in point cloud aims at simultaneously localizing and recognizing 3D objects from a 3D point set. However, since point clouds are usually sparse, unordered, and irregular, it is challenging to learn robust point representations and sample high-quality object queries. To deal with the above issues, we propose a Long-short rangE Adaptive transformer with Dynamic sampling (LeadNet), including a point representation encoder, a dynamic object query sampling decoder, and an object detection decoder in a unified architecture for 3D object detection. Specifically, in the point representation encoder, we combine an attention layer and a channel attentive kernel convolution layer to consider the local structure and the long-range context simultaneously. In the dynamic object query sampling decoder, we utilize multiple dynamic prototypes to adapt to various point clouds. In the object detection decoder, we incorporate a dynamic Gaussian weight map into the cross-attention mechanism to help the detection decoder focus on the proper visual regions near the object, further accelerating the training process. Extensive experimental results on two standard benchmarks show that our LeadNet outperforms the 3DETR baseline by 11.6% mAP <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">50</sub> on the ScanNet v2 dataset and achieves the new state-of-the-art results on ScanNet v2 and SUN RGB-D benchmarks for the geometric-only approaches.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Long-Short Range Adaptive Transformer With Dynamic Sampling for 3D Object Detection

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society

Lead the way for us

Journal: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society	Publication Date: Dec 1, 2023
Citations: 1

Similar Papers

Vehicle Detection Based on Structure Perception in Point Cloud
Zongmin Li ... Chunchun Yao
Journal of Computer-Aided Design & Computer Graphics | VOL. 33
Zongmin Li, et. al.Zongmin Li ... Chunchun Yao
01 Mar 2021
Journal of Computer-Aided Design & Computer Graphics | VOL. 33

The Graph Neural Network Detector Based on Neighbor Feature Alignment Mechanism in LIDAR Point Clouds
Xinyi Liu ... Na Liu
Machines | VOL. 11
Xinyi Liu, et. al.Xinyi Liu ... Na Liu
14 Jan 2023
Machines | VOL. 11

Monocular 3D object detection for construction scene analysis
Jie Shen ... Cong Zhang
Computer-Aided Civil and Infrastructure Engineering | VOL. -
Jie Shen, et. al.Jie Shen ... Cong Zhang
20 Dec 2023
Computer-Aided Civil and Infrastructure Engineering | VOL. -

Effective Backbone Network for 3D Object Detection in Point Cloud
Jun Xu ... Songhua He
IOP Conf. Series: Materials Science and Engineering | VOL. 711
Jun Xu, et. al.Jun Xu ... Songhua He
01 Jan 2020
IOP Conf. Series: Materials Science and Engineering | VOL. 711

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Long-Short Range Adaptive Transformer With Dynamic Sampling for 3D Object Detection

Abstract

Talk to us

Similar Papers

More From: IEEE transactions on circuits and systems for video technology : a publication of the Circuits and Systems Society