Abstract
ABSTRACT Crowd counting and localisation are essential tasks in crowd analysis and are vital to ensure public safety. However, these tasks via UAV bring new obstacles compared with video surveillance (e.g. viewpoint and scale variations, background clutter, and small scales). To overcome the difficulties, this research presents a novel network named PDNet. It employs the multi-task learning approach to combine the point regression and density map regression. PDNet includes a backbone to extract multi-scale features, a Dilated Feature Fusion module (DFF), a Density Map Attention module (DMA), a density map branch and a point branch. Aims of DFF is to address the difficulties of small targets and scale variations by establishing relationships between targets and their surroundings. DMA is created to address the challenges of complicated backgrounds, allowing the PDNet to focus on the target's location. In addition, the density map branch and point branch are designed for density maps regression and point regression, respectively. Experiments on the DroneCrowd dataset demonstrate that our proposed network outperforms state-of-the-art approaches in terms of localisation, L-mAP (53.85%), L-AP@10 (59.14%), L-AP@15 (63.64%), and L-AP@20 (66.21%), and we improved counting performance and significantly reduced inference time. In addition, ablation experiments are conducted to prove the modules' effectiveness.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.