Abstract

Most existing crowd counting approaches use limited information of RGB (red–green–blue) images and fail to suitably extract potential pedestrians in unconstrained scenarios. Moreover, complementary depth maps do not provide information of locations where people are more likely to be present. However, by incorporating optical and thermal information, the recognition of pedestrians may be enhanced considerably. In fact, thermal imaging information is robust to weather and lighting scenarios, and information from targets can be extracted even at nighttime. By combining RGB and thermal imaging information, we propose a dual-branch enhanced feature fusion network (DEFNet) for RGB-T (RGB and thermal) crowd counting. In DEFNet, an intensive data-enhancement module fuses complementary features of the same sizes from the RGB and thermal modalities, thus combining various rich receptive fields and generating powerful fused RGB-T features. These features describe both spatial structures and appearance details, highlighting information of crowd location. Then, an efficient dilation fusion module applies convolutions to the RGB -T features to obtain flexible and specific features, effectively eliminating the influence of background on the crowd information for density map prediction. Finally, high- and low-level features are used to efficiently obtain density maps through a fusion decoding module. Experimental results on an RGB-T crowd counting dataset indicate that the proposed DEFNet outperforms existing approaches. Furthermore, DEFNet can be generalized to handle RGB and depth data.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call