Abstract

This study focuses on the problem of dense object counting. In dense scenes, variations in object scales and uneven distributions greatly hinder counting accuracy. The current methods, whether CNNs with fixed convolutional kernel sizes or Transformers with fixed attention sizes, struggle to handle such variability effectively. Lower-resolution features are more sensitive to larger objects closer to the camera, while higher-resolution features are more efficient for smaller objects further away. Thus, preserving features that carry the most relevant information at each scale is crucial for improving counting precision. Motivated by this, we propose a multi-resolution scale feature fusion-based universal density counting network (MRSNet). It utilizes independent modules to process high- and low-resolution features, adaptively adjusts receptive field sizes, and incorporates dynamic sparse attention mechanisms to optimize feature information at each resolution, by integrating optimal features across multiple scales into density maps for counting evaluation. Our proposed network effectively mitigates issues caused by large variations in object scales, thereby enhancing counting accuracy. Furthermore, extensive quantitative analyses on six public datasets demonstrate the algorithm's strong generalization ability in handling diverse object scale variations.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.