RGB-D Crowd Counting With Cross-Modal Cycle-Attention Fusion and Fine-Coarse Supervision

He Li,Shihui Zhang,Weihang Kong

doi:10.1109/tii.2022.3171352

Abstract

To tackle the negative effect of the arbitrary crowd distribution on the counting task, in this article, we propose a novel RGB-D crowd counting approach, including a cross-modal cycle-attention fusion (CmCaF) model and a novel fine-coarse (FC) supervision. In the feature level, the CmCaF model combines the RGB feature and depth feature in a cycle-attention way so as to model the crowd distribution effectively. In the supervision level, the novel design of FC supervision could optimize the counting model from both the fine pixel-aware level and coarse region-aware level to enhance its sensitivity to the whole crowd distribution and the instance location. Extensive evaluations on benchmarks well illustrate the feasibility of the proposed approach for the RGB-D crowd counting, as well as RGB and RGB-T counting. And the ablation study demonstrates the effectiveness of its main components on both the feature representation of cross-modal data and the accurate estimation of the crowd distribution.

Full Text