In response to the issue that the fusion process of infrared and visible images is easily affected by lighting factors, in this paper, we propose an adaptive illumination perception fusion mechanism, which was integrated into an infrared and visible image fusion network. Spatial attention mechanisms were applied to both infrared images and visible images for feature extraction. Deep convolutional neural networks were utilized for further feature information extraction. The adaptive illumination perception fusion mechanism is then integrated into the image reconstruction process to reduce the impact of lighting variations in the fused images. A Median Strengthening Channel and Spatial Attention Module (MSCS) was designed to be integrated into the backbone of YOLOv8. In this paper, we used the fusion network to create a dataset named ivifdata for training the target recognition network. The experimental results indicated that the improved YOLOv8 network saw further enhancements of 2.3%, 1.4%, and 8.2% in the Recall, mAP50, and mAP50-95 metrics, respectively. The experiments revealed that the improved YOLOv8 network has advantages in terms of recognition rate and completeness, while also reducing the rates of false negatives and false positives.