Abstract
3D object detection is an important yet challenging problem in a myriad of vision, robotics, and human–machine interaction applications. Given an RGB-D image, the task is to infer the class labels and the 3D bounding boxes of the objects in the image. While the previous studies have made remarkable progress over the past decade, how to effectively exploit the feature fusion with neural networks for boosting 3D object detection performance remains an open problem. This paper proposes a multilevel fusion network (MFN) model to detect 3D objects in RGB-D images. The MFN model contains two streams of neural networks which respectively extracts the RGB and depth features with cascaded convolutional modules. To effectively exploit the information of 3D objects, a multilevel fusion mechanism is adopted to fuse the convolutional RGB and depth features at multiple levels. To train the network, we propose a new weighted loss function by encoding the difference of geometric attributes on 3D bounding box regression. Since the original depth data is full of noisy holes, we also develop an adaptive filtering algorithm to restore and correct the depth images. We test the proposed model on challenging RGB-D datasets. The experimental results on the datasets prove the strength and advantage of the proposed model.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.