Introduction: To address the challenges of low visibility, object recognition difficulties, and low detection accuracy in foggy weather, this paper introduces the WViT-YOLO real-time fog detection model, built on the YOLOv5 framework. The NVIT-Net backbone network, incorporating NTB and NCB modules, enhances the model's ability to extract both global and local features from images. Method: An efficient convolutional C3_DSConv module is designed and integrated with channel attention mechanisms and ShuffleAttention at each upsampling stage, improving the model's computational speed and its ability to detect small and blurry objects. The Wise-IOU loss function is utilized during the prediction stage to enhance the model's convergence efficiency. Result: Experimental results on the publicly available RTTS dataset for vehicle detection in foggy conditions demonstrate that the WViT-YOLO model achieves a 3.2% increase in precision, a 9.5% rise in recall, and an 8.6% improvement in mAP50 compared to the baseline model. Furthermore, WViT-YOLO shows a 9.5% and 8.6% mAP50 improvement over YOLOv7 and YOLOv8, respectively. For detecting small and blurry objects in foggy conditions, the model demonstrates approximately a 5% improvement over the benchmark, significantly enhancing the detection network's generalization ability under foggy conditions. Conclusion: This advancement is crucial for improving vehicle safety in such weather. The code is available at https://github.com/QinghuaZhang1/mode.
Read full abstract