Abstract

Multispectral images can provide more information, so pedestrian detection based on multispectral images has received wide attention. Existing multispectral networks mainly focus on the misalignment of image pairs and the difference between modalities. However, these network structures lack effective information interaction between two feature streams and fail to consider the scale characteristics of pedestrian objects. To deal with this issue, we propose a high-performance network structure, which is called dual-stream interaction and multi-scale feature extraction network (DSI-MSE), and contains a dual-stream feature interaction (DSI) block, a multi-scale feature extraction (MSE) block and a detection (DET) block. The DSI block extracts the features through the dual-stream interaction of RGB images and thermal images, which fuses the intra-modal information and the inter-modal information. The MSE block is designed by multiple parallel branches for matching multiple scales of pedestrian, which enhances the expressiveness of features and refines richer feature expressions at different scales. Experimental results on KAIST and CVC-14 datasets demonstrate that the proposed DSI-MSE can obtain the state-of-the-art results on multi-spectral pedestrian detection tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call