Monocular depth estimation provides low-cost environmental information for intelligent systems such as autonomous vehicles and robots, supporting sustainable development by reducing reliance on expensive, energy-intensive sensors and making technology more accessible and efficient. However, in practical applications, monocular vision is highly susceptible to adverse weather conditions, significantly reducing depth perception accuracy and limiting its ability to deliver reliable environmental information. To improve the robustness of monocular depth estimation in challenging weather, this paper first utilizes generative models to adjust image exposure and generate synthetic images of rainy, foggy, and nighttime scenes, enriching the diversity of the training data. Next, a channel interaction module and Multi-Scale Fusion Module are introduced. The former enhances information exchange between channels, while the latter effectively integrates multi-level feature information. Finally, an enhanced consistency loss is added to the loss function to prevent the depth estimation bias caused by data augmentation. Experiments on datasets such as DrivingStereo, Foggy CityScapes, and NuScenes-Night demonstrate that our method, CIT-Depth, exhibits superior generalization across various complex conditions.
Read full abstract