In infrared detection scenarios, detecting and recognizing low-contrast and small-sized targets has always been a challenge in the field of computer vision, particularly in complex road traffic environments. Traditional target detection methods usually perform poorly when processing infrared small targets, mainly due to their inability to effectively extract key features and the significant feature loss that occurs during feature transmission. To address these issues, this paper proposes a fast detection and recognition model based on a multi-scale self-attention mechanism, specifically for small road targets in infrared detection scenarios. We first introduce and improve the DyHead structure based on the YOLOv8 algorithm, which employs a multi-head self-attention mechanism to capture target features at various scales and enhance the model's perception of small targets. Additionally, to prevent information loss during the feature transmission process via the FPN structure in traditional YOLO algorithms, this paper introduces and enhances the Gather-and-Distribute Mechanism. By computing dependencies between features using self-attention, it reallocates attention weights in the feature maps to highlight important features and suppress irrelevant information. These improvements significantly enhance the model's capability to detect small targets. Moreover, to further increase detection speed, we pruned the network architecture to reduce computational complexity and parameter count, making the model suitable for real-time processing scenarios. Experiments on our self built infrared road traffic dataset (mainly including two types of targets: vehicles and people) show that compared with the baseline, our method achieves a 3.1% improvement in AP and a 2.5% increase in mAP on the VisDrone2019 dataset, showing significant enhancements in both detection accuracy and processing speed for small targets, with improved robustness and adaptability.
Read full abstract