In this paper, we make the first research effort to address the RGB-Thermal (RGB-T) crowd counting problem with the decision-level late fusion manner. Being different from the existing pixel-level or feature-level fusion methods, our proposition chooses to fuse the density maps yielded by RGB and thermal counterparts via spatially adaptive weighting with RGB illumination-aware attention. Our key intuition to conduct RGB-T density map fusion lies in 2 main folders. First, compared with the raw RGB-T images or convolutional feature maps, RGB-T density maps contain stronger counting-wise semantic meanings. Secondly, they are also of high spatial resolution for revealing fine local details. To fuse them adaptively, a spatial weighting map for each modality, together with an illumination-related RGB weight is generated. In this way, the issues of RGB illumination awareness and local counting pattern characterization ability are concerned jointly. To the best of our knowledge, we are the first to leverage RGB-T crowd counting concerning these 2 issues in a unified way. Meanwhile, cross-modality feature interaction is conducted between RGB and thermal modalities to facilitate spatial weighting map generation. The experiments on 2 well-established RGB-T crowd counting datasets (i.e., RGBT-CC and DroneRGBT) verify the superiority of our proposition. The source code and pretrained models will be released upon acceptance at https://github.com/hustaia/DLF-IA.