Abstract

Presently, to obtain a more accurate density map and crowd number, existing methods often count by combining training RGB images and depth images. However, these methods are not ideal for capturing and fusing complementary features in RGB-D. Therefore, to solve the above problems, we propose a collaborative cross-modal attention network named CCANet for accurate RGB-D crowd counting. CCANet is mainly composed of the collaborative cross-modal attention module (CCAM) and the collaborative cross-modal fusion module (CCFM). Specifically, CCAM focuses on adaptive, interleaved RGB-D information through channel and spatial cross-modal attentions to fully capture complementary features in different modes. CCFM can adaptively integrate these features by weighing the importance of the above complementary features. A large number of experiments on the ShanghaiTechRGBD and MICC benchmarks have proven the effectiveness of CCANet in RGB-D crowd counting. In addition, our CCANet is generally applicable to multimodal crowd counting and has achieved superior counting performance on the RGBT-CC benchmark.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.