Residual Transformer YOLO for Detecting Multi-Scale Crowded Pedestrian

Hechao Ye,Yanni Wang

doi:10.3390/app132112032

Abstract

Crowding and occlusion pose significant challenges for pedestrian detection, which can easily lead to missed and false detections for small-scale and occluded pedestrian objects in dense pedestrian scenarios. To enhance dense pedestrian detection accuracy, we propose the Residual Transformer YOLO (RT-YOLO) algorithm in this paper. The RT-YOLO algorithm enhances the multi-scale fusion strategy based on YOLOv7 and introduces a dedicated detection layer for small-scale occluded targets. It also integrates Resnet and Transformer structures to improve the small-scale feature layer and detection head, enhancing feature extraction capabilities. Additionally, the RT-YOLO algorithm incorporates the Normalization-based Attention Module (NAM) into the backbone and neck networks to identify the region of interest. The experiments demonstrate that on the CrowdHuman and WiderPerson datasets, at IOU (Intersection over Union) = 0.5, the overall improvement in mAP50 is 3.8% and 3.4%. In the IOU range from 0.5 to 1, the improvement in mAP50: 95 is 5.1% and 4%. RT-YOLO achieves an FPS of 67, maintaining real-time performance. On the VOC2007 dataset, mAP50 has been enhanced by 5.1%, indicating higher effectiveness and robustness.

Full Text