Abstract

In this paper, we propose a point cloud based 3D object detection framework that accounts for both contextual and local information by leveraging multi-receptive field pillars, named as MuRF-Net. Recently, common pipelines can be divided into a voxel-based feature encoder and an object detector. During the feature encoding steps, contextual information is neglected, which is critical for the 3D object detection task. Thus, the encoded features are not suitable to input to the subsequent object detector. To address this challenge, we propose the MuRF-Net with a multi-receptive field voxelization mechanism to capture both contextual and local information. After the voxelization, the voxelized points (pillars) are processed by a feature encoder, and a channel-wise feature reconfiguration module is proposed to combine the features with different receptive fields using a lateral enhanced fusion network. In addition, to handle the increase of memory and computational cost brought by multi-receptive field voxelization, a dynamic voxel encoder is applied taking advantage of the sparseness of the point cloud. Experiments on the KITTI benchmark for both 3D object and Bird's Eye View (BEV) detection tasks on car class are conducted and MuRF-Net achieved the state-of-the-art results compared with other voxel-based methods. Besides, the MuRF-Net can achieve nearly real-time speed with 20Hz.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.