Semantic-refined spatial pyramid network for crowd counting

Lifang Zhou,Peiwen Wang,Jiaxu Leng,Bangjun Lei,Weisheng Li

doi:10.1016/j.patrec.2022.04.029

Abstract

• We design a novel semantic-refined spatial pyramid network (SSPNet). • The SSPNet can benefit the fusing of multi-scale information. • The SSPNet can enhance the representation ability of deep semantic features. • The multi-scale feature will be adaptively integrated by our method. • The texture features are obtained to enrich the multi-scale features. In this paper, we propose a novel encoder-decoder model called Semantic-refined Spatial Pyramid Network (SSPNet) for generating high-quality density maps, which aims to build a scale-aware counting network to estimate the number of crowds accurately. The SSPNet consists of the front-end based on VGG-16, spatial pyramid multi-scale module (SPMM), and semantic enhancement module (SEM). First, a series of convolutional neural layers are utilized as the front-end to get deeper features without the extra computational cost. Moreover, the SPMM, which has a spatial pyramid structure with multiple receptive fields, is employed to capture multi-scale features. Furthermore, the SEM is designed to refine the features captured by SPMM, which uses deep semantic information to better integrate multi-scale features. Finally, the shallow texture information is adopted to compensate for the detail of the feature map to enhance the quality of the density map. Extensive experiments and comparisons on three challenge datasets, including ShanghaiTech Part_A & Part_B, UCF_CC_50, and UCF-QNRF, illustrate the superiority of our method.

Full Text