Abstract

To tackle the negative effect of the arbitrary crowd distribution on the counting task, in this article, we propose a novel RGB-D crowd counting approach, including a cross-modal cycle-attention fusion (CmCaF) model and a novel fine-coarse (FC) supervision. In the feature level, the CmCaF model combines the RGB feature and depth feature in a cycle-attention way so as to model the crowd distribution effectively. In the supervision level, the novel design of FC supervision could optimize the counting model from both the fine pixel-aware level and coarse region-aware level to enhance its sensitivity to the whole crowd distribution and the instance location. Extensive evaluations on benchmarks well illustrate the feasibility of the proposed approach for the RGB-D crowd counting, as well as RGB and RGB-T counting. And the ablation study demonstrates the effectiveness of its main components on both the feature representation of cross-modal data and the accurate estimation of the crowd distribution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call