Despite the considerable success of deep learning methods in stereo matching for binocular images, the generalizability and robustness of these algorithms, particularly under challenging conditions such as occlusions or degraded infrared textures, remain uncertain. This paper presents a novel deep-learning-based depth optimization method that obviates the need for large infrared image datasets and adapts seamlessly to any specific infrared camera. Moreover, this adaptability extends to standard binocular images, allowing the method to work effectively on both infrared and visible light stereo images. We further investigate the role of infrared textures in a deep learning framework, demonstrating their continued utility for stereo matching even in complex lighting environments. To compute the matching cost volume, we apply the multi-scale census transform to the input stereo images. A stacked sand leak subnetwork is subsequently employed to address the matching task. Our approach substantially improves adversarial robustness while maintaining accuracy on comparison with state-of-the-art methods which decrease nearly a half in EPE for quantitative results on widely used autonomous driving datasets. Furthermore, the proposed method exhibits superior generalization capabilities, transitioning from simulated datasets to real-world datasets without the need for fine-tuning.