The paper proposes a model based on receptive field enhancement and cross-scale fusion (RFCS-YOLO). It addresses challenges like complex backgrounds and problems of missing and mis-detecting traffic targets in bad weather. First, an efficient feature extraction module (EFEM) is created. It reconfigures the backbone network. This helps to make the receptive field better and improves its ability to extract features of targets at different scales. Next, a cross-scale fusion module (CSF) is introduced. It uses the receptive field coordinate attention mechanism (RFCA) to fuse information from different scales well. It also filters out noise and background information that might interfere. Also, a new Focaler-Minimum Point Distance Intersection over Union (F-MPDIoU) loss function is proposed. It makes the model converge faster and deals with issues of leakage and false detection. Experiments were conducted on the expanded Vehicle Detection in Adverse Weather Nature dataset (DWAN). The results show significant improvements compared to the conventional You Only Look Once v7 (YOLOv7) model. The mean Average Precision (mAP@0.5), precision, and recall are enhanced by 4.2%, 8.3%, and 1.4%, respectively. The mean Average Precision is 86.5%. The frame rate is 68 frames per second (FPS), which meets the requirements for real-time detection. A generalization experiment was conducted using the autonomous driving dataset SODA10M. The mAP@0.5 achieved 56.7%, which is a 3.6% improvement over the original model. This result demonstrates the good generalization ability of the proposed method.
Read full abstract