Abstract

Multispectral pedestrian detection is a difficult task, especially with pedestrian images of different sizes. In most convolutional neural network (CNN) models, the shared receptive fields of each layer are of the same size, which constrains detection results of multiple scales pedestrians. In this paper, we propose a dynamic selection scheme to adaptive adjust receptive field size in multispectral pedestrian detection. Specifically, a network in network (NIN) is employed to combine visible and thermal information. Selective kernel networks (SKNets) which uses selective kernel unit with different kernel size are employed. To effectively fuse the feature representation in each layer, a build block is designed, in which different features are fused. Feature pyramid is employed to integrate feature information in each layer. We empirically show that our method outperforms existing 8 state-of-the-art methods on one multispectral dataset and 4 state-of-the-art methods on another multispectral dataset. Detailed analyses show that our proposed method can capture multispectral pedestrian detection of different scales, which confirms the effective of SKNets for adaptively resizing the receptive field sizes. In addition, our method can operate at 14 frames per second (fps) on a GPU.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call