MAF-YOLO: Multi-modal attention fusion based YOLO for pedestrian detection

Yongjie Xue,Zhiyong Ju,Yuming Li,Wenxin Zhang

doi:10.1016/j.infrared.2021.103906

Abstract

Achieving the rapid and precise detection of pedestrians in natural environments is crucial for many applications of artificial intelligence systems. However, accurate pedestrian detection at nighttime is challenging due to the low-luminous density and low-resolution of infrared images, and the detection speed is required. In this paper, a real-time pedestrian detection method using a novel multi-modal attention fusion YOLO (MAF-YOLO) was proposed. Firstly, a multi-modal feature extraction module based on the compressed Darknet53 framework was built to adapt the nighttime pedestrian detection and ensure efficiency. The features were extracted from both modalities and then fused by a modal weighted fusion module. Secondly, we defined a loss function and regenerated the size of the anchor box based on the K-means clustering algorithm to improve the detection speed and the robustness of small objects. Finally, a dual attention module was applied to acquire more semantic features from small objects with low-resolution. The experimental results on KAIST and OSU Color-Thermal datasets corroborated the effectiveness of the proposed MAF-YOLO. The proposed methods could also be utilized to facilitate the performance of other pedestrian detection algorithms.

Full Text