Fast and Accurate 3D Object Detection for Lidar-Camera-Based Autonomous Vehicles Using One Shared Voxel-Based Backbone

Li-Hua Wen,Kang-Hyun Jo

doi:10.1109/access.2021.3055491

Abstract

Currently, many kinds of LiDAR-camera-based 3D object detectors have been developed with two heavy neural networks to extract view-specific features, while a LiDAR-camera-based 3D detector with only one neural network has not been implemented. To tackle this issue, this paper first presents an early-fusion method to exploit both LiDAR and camera data for fast 3D object detection with only one backbone, achieving a good balance between accuracy and efficiency. We propose a novel point feature fusion module to directly extract point-wise features from raw RGB images and fuse them with their corresponding point cloud with no backbone. In this paradigm, the backbone that extracts RGB image features is abandoned to reduce the large computation cost. Our method first voxelizes a point cloud into a 3D voxel grid and utilizes two strategies to reduce information loss during voxelization. The first strategy is to use a small voxel size (0.05m, 0.05m, 0.1m) in X-axis, Y-axis, and Z-axis, respectively, while the second one is to project the feature (e.g. intensity or height information) of point clouds onto RGB images. Numerous experiments evaluated on the KITTI benchmark suite show that the proposed approach outperforms state-of-the-art LiDAR-camera-based methods on the three classes in 3D performance (Easy, Moderate, Hard): cars (88.04%, 77.60%, 76.23%), pedestrians (66.65%, 60.49%, 54.51%), and cyclists (75.87%, 60.07%, 54.51%). Additionally, the proposed model runs at 17.8 frames per second (FPS), which is almost 2× faster than state-of-the-art fusion methods for LiDAR and camera.

Highlights

With the rapid development of autonomous vehicles, three-dimensional (3D) object detection has become more important, whose purpose is to perceive the size and accurate location of objects in the real world
This paper proposes a highly-efficient pointwise feature fusion module, which directly extracts the RGB image point feature based on a point cloud and fuses the extracted RGB image point feature with the corresponding feature of the point cloud
RELATED WORK This section starts by reviewing recent works in applying convolutional neural networks (CNNs) to 3D object detection based on LiDAR, and focuses on methods specific to multi-modal 3D object detection from point clouds and RGB images

Summary

INTRODUCTION

With the rapid development of autonomous vehicles, three-dimensional (3D) object detection has become more important, whose purpose is to perceive the size and accurate location of objects in the real world. LiDAR is employed to collect the surrounding 3D data, referred to as a point cloud, and the camera is used to capture a high-resolution RGB image. It is non-trivial to highly efficiently and quickly extract and fuse the features of the point cloud and RGB image. Before the advent of highly-efficient graphics processing units (GPUs), representative studies [5]–[10] have converted point clouds into 2D dense images or structured voxel-grid representations and utilized 2D neural networks to extract the corresponding feature from the converted 2D image. Jo: Fast and Accurate 3D Object Detection for Lidar-Camera-Based Autonomous Vehicles object detection with only one backbone, achieving a good balance between accuracy and efficiency. This paper enhances 3D object detection with an RGB+ image, which preserves the information projected from its corresponding point cloud. The presented one-stage 3D multi-class object detection framework outperforms state-of-the-art LiDAR-camerabased methods on the KITTI benchmark [18] both in terms of the speed and accuracy

RELATED WORK

LOSS FUNCTION

DATASET

EXPERIMENTAL SETTINGS

ABLATION STUDIES

Findings

CONCLUSION

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE access : practical innovations, open solutions	Publication Date: Jan 1, 2021
Citations: 76	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Fast and Accurate 3D Object Detection for Lidar-Camera-Based Autonomous Vehicles Using One Shared Voxel-Based Backbone

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions

Lead the way for us

Similar Papers

Realtime Single-Shot Refinement Neural Network for 3D Obejct Detection from LiDAR Point Cloud
Yutian Wu ... Harutoshi Ogai
-
Yutian Wu, et. al.Yutian Wu ... Harutoshi Ogai
23 Sep 2020
23 Sep 2020

Adaptive Voxelization Strategy for 3D Object Detection
Jyun Hong He ... Xiu-Zhi Chen
-
Jyun Hong He, et. al.Jyun Hong He ... Xiu-Zhi Chen
06 Jul 2022
06 Jul 2022

Fast 3D Object Detection with RGB-D Images Using Graph Convolutional Network
Masahiro Takahashi ... Alessandro Moro
-
Masahiro Takahashi, et. al.Masahiro Takahashi ... Alessandro Moro
09 Jan 2022
09 Jan 2022

A Lightweight Attention Fusion Module for Multi-sensor 3-D Object Detection
Li-Hua Wen ... Ting-Yue Xu
-
Li-Hua Wen, et. al.Li-Hua Wen ... Ting-Yue Xu
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Fast and Accurate 3D Object Detection for Lidar-Camera-Based Autonomous Vehicles Using One Shared Voxel-Based Backbone

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE access : practical innovations, open solutions