• We propose a density-aware and background-aware multi-task learning network for crowd counting. • A multi-task joint loss is proposed for our multi-task learning network. • Extensive experiments are conducted on three crowd counting benchmark datasets and promising results are achieved. In this paper, we propose a density-aware and background-aware network via multi-task learning (MTL-DB) for crowd counting. It aims to enable the model to capture the high-level semantic information of density and background via multi-task joint training, which may jointly optimize the generation of density maps. Initially, MTL-DB utilizes the first ten layers of VGG-16 with Batch Normalization as the front-end to extract primary features which will be shared by all tasks. Then, a multi-task back-end is constructed by integrating the main task of density map estimation with two auxiliary tasks, i.e., density classification and background segmentation. The density classification auxiliary task captures the density-related information with a fully connected classifier, while the background segmentation auxiliary task applies dilated convolutional network to distinguish the head area of pedestrians and background. With high-level semantic awareness, the main task generates estimated density maps utilizing normal convolutional layers. Furthermore, a multi-task joint loss is proposed to improve the quality of estimated density maps. Extensive experiments on three challenging crowd datasets (ShanghaiTech Part A & B, UCF_CC_50, and UCF_QNRF) verified the effectiveness of this multi-task learning model. MTL-DB outperformed other multi-task learning methods on the ShanghaiTech dataset, both Part A and Part B.
Read full abstract