Abstract

Crowd counting is a fundamental and challenging task that requires rich information to generate a pixel-level crowd density map. Additionally, the development of thermal sensing and its applicability to computer vision has enabled the use of thermal information for crowd counting. Considering the complementary characteristics of RGB (red–green–blue) and thermal images in different feature encoding stages, we propose a cross-modality grade interaction network (CGINet) for RGB-T (RGB and thermal) crowd counting. We introduce an RGB cooperative enhancement module for thermal information to correctly extract low-level features from scenes containing objects with different scales. As RGB information is sensitive to lighting and occlusion while extracting high-level features, we propose a thermal information supplementary module to increase the RGB feature robustness. In addition, a novel multilayer decoding module fully integrates features at different levels, exploits the features of different layers, and predicts the crowd density map. Results from comprehensive experiments on the RGBT-CC benchmark demonstrate the effectiveness of the proposed CGINet for RGB-T crowd counting. In addition, CGINet achieves excellent results on the ShanghaiTechRGB dataset containing paired RGB images and depth maps. The experimental results highlight the advanced architecture and generalization ability of CGINet for multimodality crowd counting.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call