Abstract

Cross spectral stereo matching is a challenging task due to different spectral properties causing unreliable results in correspondence estimation. In this paper, we propose joint disparity estimation and pseudo near infrared (NIR) generation from cross spectral image pairs. To bridge the spectral gap between paired images, we adopt differential map operations and non-local blocks to improve the local attention and global attention of the network. The proposed network is based on unsupervised learning that consists of one encoder and two decoders, which performs both spectral translation and disparity estimation. For cooperative learning, we use difference map operation to connect two decoders, thus improving the inference ability of the decoder in regions even with large spectral differences. Experimental results show that the proposed network achieves good performance in cross spectral stereo matching for unreliable regions such as shadows and glasses. Moreover, the proposed network generates pseudo NIR images nearly the same as the ground truth even in the regions with large spectral difference. Besides, we achieve real-time speed of 27 FPS for <inline-formula> <tex-math notation="LaTeX">$582\times 429$ </tex-math></inline-formula> image pairs on RTX 2060 6G GPU due to the low computational complexity.

Highlights

  • In recent years, self-driving cars [1], [2], robotics [3]–[6] and augmented reality [7]–[9] have gained much attention

  • We propose an end-to-end network for cross spectral stereo matching that consists of one encoder and two decoders

  • We have adopted Non-Local block (NLB) and difference map operation (DMO) to bridge the spectral gap between Y channel and near infrared (NIR) image

Read more

Summary

INTRODUCTION

Self-driving cars [1], [2], robotics [3]–[6] and augmented reality [7]–[9] have gained much attention. Zhi et al [17] proposed a spectral translation network based on channel weighting to generate pseudo NIR image. They provided an unsupervised CNN framework to predict the disparity map. ActiveStreoNet treated the oversmoothing problem in unsupervised disparity estimation, and preserved edges effectively by dealing with occlusion They have achieved good performance in stereo matching between paired color images, but they are not suitable for cross spectral matching. Because CycleGAN consumes too much memory, Zhi et al [17] proposed a spectral translation network (STN) to generate the weights of RGB three channels and estimate the pseudo NIR image. We use the features as the input of the disparity estimation decoder (DED) to

NON-LOCAL BLOCK
DIFFERENCE MAP OPERATION
LOSS FUNCTION
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.