Traditional object counting systems use object detection methods to count objects. However, when objects are small, crowded, and dense, object detection may fail, leading to inaccuracies in counting. To address this issue, we propose a crowded object counting system based on density map estimation. While most density map estimation models employ encoder–decoder or multi-branch approaches to generate feature maps at different scales for obtaining an accurate density map, improving the accuracy of crowded object counting remains a challenge. In this paper, we propose a novel model that can generate more accurate density maps, utilizing the context-aware network as the primary structure and integrating the self-attention mechanism. There are three main contributions in this paper. Firstly, the self-attention mechanism is employed to improve the accuracy of density map estimation. Secondly, the missing vehicle labels in the TRANCOS database are relabeled, ensuring that the ground truth data are more complete than the original TRANCOS database, thus enabling the proposed novel model to have higher crowded object counting accuracy. Thirdly, the parameters of the self-attention mechanism are analyzed to obtain the optimum parameter combination. The experimental results demonstrate that the accuracy of crowded object counting can reach 85.9%, 90.0%, 83.4%, and 92.6% for the TRANCOS, relabeled TRANCOS, ShanghaiTech Part A, and Part B datasets, respectively. Furthermore, the ablation study for the context-aware network with self-attention mechanism analyzes the optimum parameter combination.
Read full abstract