Selective Kernel and Spatial Grouping Attention Network for Occluded Pedestrian Detection

Yaru Wang,Qiang Zhang,Hua Yu,Yijing Li

doi:10.1109/icvr55215.2022.9847953

Abstract

Pedestrian detection has achieved significant progress on computer vision tasks in recent years. Most pedestrian detection methods employ deep convolutional neural networks to extract abstract features. However, convolution is a local operation that relies on down-sampling to obtain high-level semantic features, which cannot extract global image information or selectively focus on the input features. Furthermore, since the majority of the pedestrian's body is invisible under severe occlusion, the performance of existing pedestrian detectors remains further improvement. To this end, we propose a novel network with selective kernel and spatial grouping attention, i.e., SKGNet, for the occluded pedestrian detection task. Specifically, we first introduce a lightweight attention module, selective kernel and spatial grouping attention (SKG), which is embedded in the SKGNet's feature extraction backbone. The SKG module combines the properties of the selective kernel (SK) and spatial grouping enhancement (SGE) mechanisms to extract more critical features and improve the expressive ability of feature maps, ultimately improving the detection performance of the network. Moreover, we propose a mask-guided (MG) module to modulate full-body features, which can highlight the visible part of pedestrians while suppressing the occlusion part, thereby significantly improving occlusion detection performance. Extensive experiments show that SKGNet outperforms the existing advanced methods on the CityPersons dataset without excessive extra parameters and computations.

Full Text