Abstract

It is challenging to apply depth maps generated from sparse laser scan data to computer vision tasks, such as robot vision and autonomous driving, because of the sparsity and noise in the data. To overcome this problem, depth completion tasks have been proposed to produce a dense depth map from sparse LiDAR data and a single RGB image. In this study, we developed a deep convolutional architecture with cross guidance for multi-modal feature fusion to compensate for the lack of representation power of their modality. Two encoders, which are part of the proposed architecture, receive different modalities as inputs. They interact with each other by exchanging information in each stage through the attention mechanism during encoding. We also propose a residual atrous spatial pyramid block, comprising multiple dilated convolutions with different dilation rates, which are used to derive highly significant features. The experimental results of the KITTI depth completion benchmark dataset demonstrate that the proposed architecture shows higher performance than that of the other models trained in a two-dimensional space without pre-training or fine-tuning other datasets.

Highlights

  • An accurate depth map with an RGB image allows users to utilize the information to solve complicated computer vision tasks

  • We verify the performance of the proposed architecture by achieving state-of-the-art results on the KITTI depth completion benchmark dataset in two-dimensional (2D) space without pre-training or fine-tuning

  • We mainly focused on the Root mean squared error (RMSE) for comparison because the RMSE is more sensitive to large errors and the base metric on the KITTI depth completion benchmark

Read more

Summary

Introduction

An accurate depth map with an RGB image allows users to utilize the information to solve complicated computer vision tasks. As shown, depth maps acquired from a LiDAR sensor have sparse structures. They cannot be applied to autonomous driving or robotics applications. To use LiDAR depth maps, the missing pixels must be provided. To this end, depth completion tasks have been introduced by [1], [2]. A precise depth map, which is useful as the prior information for processing an RGB image (e.g., object classification, detection, and segmentation), is valuable in both academic and industrial research. For obtaining high accuracy, dense depth data is very expensive

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.