Abstract

Accurate counting in dense scenes can effectively prevent the occurrence of abnormal events, which is crucial for flow management, traffic control, and urban safety. In recent years, the application of deep learning technology in counting tasks has significantly improved the performance of models, but it still faces many challenges, including the diversity of target distribution between image and background, the drastic change of target scale, and serious occlusion. To solve these problems, this paper proposes a spatial context feature fusion network, abbreviated as SCFFNet, to understand highly congested scenes and perform accurate counts as well as produce high-quality estimated density maps. SCFFNet first uses rich convolutions with different scales to calculate scale-aware features, adaptively encodes the scale of contextual information needed to accurately estimate density maps, and then calibrates and refuses the fused feature maps through a channel spatial attention-aware module, which improves the model’s ability to suppress background and focus on main features. Finally, the final estimated density map is generated by a dilated convolution module. We conduct experiments on five public crowd datasets, UCF_CC_50, WorldExpo’10, ShanghaiTech, Mall, and Beijing BRT, and the results show that our method achieves lower counting errors than existing state-of-the-art methods. In addition, we extend SCFFNet to count other objects, such as vehicles in the vehicle dataset HBR_YD, and the experimental results show that our proposed method significantly improves the output quality with higher accuracy than previous methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call