AAFM: Adaptive Attention Fusion Mechanism for Crowd Counting

Zuodong Duan,Huimin Chen,Jiahao Deng

doi:10.1109/access.2020.3012818

Abstract

CNN-based crowd counting methods have achieved great progress in recent years. However, most of these CNN-based crowd counting methods do not make full use of contextual information, which contains high-level semantic features and low-level detail features from different receptive fields of CNN. But rich contextual information is important to solve the scale variation problem of crowd counting. So the precision of previous CNN-based crowd counting methods is decreased. To solve this problem, we propose an adaptive attention fusion mechanism (AAFM). AAFM can use multi-scale features from different receptive fields of CNN effectively. It integrates the convolution network for feature learning and the attention mechanism for multi-scale features fusion. We apply the first 13 convolution layers of VGG-16 as the encoder module to extract the base features. Then, the base features are fed into the decoder module. The decoder module mainly contains Density Regression Branch (DRB) and Feature Fusion Branch (FFB). DRB uses multiple convolution layers for feature learning and multi-scale feature extraction. FFB uses attention modules for modeling multi-scale features and element-wise multiply for features fusion. Therefore, AAFM can obtain rich contextual information into the encoder-decoder framework for generating high-quality crowd density maps and accurate counting. We perform experiments on ShanghaiTech, UCF-CC-50, and UCF-QNRF datasets, and AAFM achieves promising results.

Highlights

Crowd counting is a fundamental and key problem in crowd analysis and scene understanding field
In order to scale variation of crowd counting, we propose an adaptive attention fusion mechanism (AAFM)
We present the results of the AAFM on crowd counting and crowd localization as follows

Summary

INTRODUCTION

Crowd counting is a fundamental and key problem in crowd analysis and scene understanding field. The motivation of AAFM is fusing multi-scale features of neural networks and getting rich contextual information. The decoder module mainly contains the Density Regression Branch (DRB) and the Feature Fusion Branch (FFB) It can get rich contextual information through feature learning of convolution layers, and multi-scale features fusion. The DRB contains multiple 3 × 3 convolution layers and multiple upsampling layers It can retrieve the crowd density information by supervised learning. The FFB contains multiple attention modules, multiple 1 × 1 convolution layers, and upsampling layers It can obtain rich contextual information by multi-scale features fusion. 1) We propose an attentional fusion neural network (AAFM) for crowd counting FFB can fuse multi-scale features to obtain rich contextual information It can model heads region effectively and alleviate the counting mistaken by scale variation.

RELATED WORK

ADAPTIVE ATTENTION FUSION MECHANISM

EXPERIMENT

DATASETS AND EVALUATE METRIC

IMPLEMENT DETAILS

CROWD COUNTING

CONCLUSION

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2020
Citations: 53	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

AAFM: Adaptive Attention Fusion Mechanism for Crowd Counting

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A multi-scale and multi-level feature aggregation network for crowd counting
Fushun Zhu ... Zhengyu Zhang
Neurocomputing | VOL. 423
Fushun Zhu, et. al.Fushun Zhu ... Zhengyu Zhang
21 Oct 2020
Neurocomputing | VOL. 423

A Multi-Scale Feature Fusion Network With Cascaded Supervision for Cross-Scene Crowd Counting
Xinfeng Zhang ... Wencong Shan
IEEE Transactions on Instrumentation and Measurement | VOL. 72
Xinfeng Zhang, et. al.Xinfeng Zhang ... Wencong Shan
01 Jan 2023
IEEE Transactions on Instrumentation and Measurement | VOL. 72

Multi‐level feature fusion network for crowd counting
Luyang Wang ... Xiao Tang
IET Computer Vision | VOL. 15
Luyang Wang, et. al.Luyang Wang ... Xiao Tang
01 Feb 2021
IET Computer Vision | VOL. 15

Multiple myeloma segmentation net (MMNet): an encoder-decoder-based deep multiscale feature fusion model for multiple myeloma segmentation in magnetic resonance imaging.
Xin Zhao ... Nannan Zhang
Quantitative imaging in medicine and surgery | VOL. 14
Xin Zhao, et. al.Xin Zhao ... Nannan Zhang
01 Oct 2024
Quantitative imaging in medicine and surgery | VOL. 14

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

AAFM: Adaptive Attention Fusion Mechanism for Crowd Counting

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access