Abstract

For crowd counting, the existing methods usually use an end-to-end approach to directly output the final estimated density map and perform the counts. However, as an intermediate representation, the quality of the estimated density map may significantly affect the counting performance. Therefore, some studies have attempted to optimize the estimated density map with additional attention mechanism. But these methods only focus on the high-density crowd areas and ignore the optimization of local detail areas. Consequently, we propose a more intuitive and understandable Density Map Dynamic Refinement Network (DDRNet) consisting of Counter and Refiner to further refine the local detail information of the estimated density map. Our training contains two stages. Specifically, for the first stage, Counter generates the initial density map through the feature extraction module and the backend, while Refiner, which consists of convolutional layers with different dilated rates, further refines the output of the former to obtain the final estimated density map in the second stage. Also, due to the different views of Counter and Refiner during training, we design a dynamic joint training strategy to improve counting performance. Extensive experiments on three crowd counting datasets (ShanghaiTech, UCF_CC_50, UCF-QNRF) demonstrate the effectiveness of the proposed model and achieve superior counting results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call