Locality guided cross-modal feature aggregation and pixel-level fusion for multispectral pedestrian detection

Yanpeng Cao,Xing Luo,Jiangxin Yang,Yanlong Cao,Michael Ying Yang

doi:10.1016/j.inffus.2022.06.008

Yanpeng Cao, Xing Luo + Show 3 more

Open Access

https://doi.org/10.1016/j.inffus.2022.06.008

Copy DOI

Journal: Information Fusion	Publication Date: Jul 13, 2022
Citations: 20	License type: other-oa

Affiliation: Zhejiang University, University of Twente

Abstract

Multispectral pedestrian detection has received much attention in recent years due to its superiority in detecting targets under adverse lighting/weather conditions. In this paper, we aim to generate highly discriminative multi-modal features by aggregating the human-related clues based on all available samples presented in multispectral images. To this end, we present a novel multispectral pedestrian detector performing locality guided cross-modal feature aggregation and pixel-level detection fusion. Given a number of single bounding boxes covering pedestrians in both modalities, we deploy two segmentation sub-branches to predict the existence of pedestrians on visible and thermal channels. By referring to the important locality information in the reference modality, we perform locality guided cross-modal feature aggregation to learn highly discriminative human-related features in the complementary modality by exploring the clues of all available pedestrians. Moreover, we utilize the obtained spatial locality maps to provide prediction confidence scores in visible and thermal channels and conduct pixel-wise adaptive fusion of detection results in complementary modalities. Extensive experiments demonstrate the effectiveness of our proposed method, outperforming the current state-of-the-art detectors on both KAIST and CVC-14 multispectral pedestrian detection datasets.

Full Text