Abstract Infrared imagery surpasses the limitations of visible light images and finds widespread applications in fields such as military reconnaissance and security surveillance. Recent studies on infrared target detection aim to preserve local features and global representations to the greatest extent. However, compared to visible light images, infrared images exhibit inherent challenges such as insufficient texture information and coarse boundaries, which introduce new difficulties to this research. To address these issues, this paper introduces additional information cues from the perspective of enriching feature map information. Specifically, we propose a multidomain feature fusion object detector (MFFOD), whose backbone feature extraction network consists of a convolutional branch and a fast Fourier transform (FFT) branch. This hybrid domain representation enables the extraction of both domain-specific information and global high-frequency and low-frequency information with minimal computational overhead. Furthermore, in the intermediate layers of the network, we have carefully designed a feature injection module that enables comprehensive interaction between channel features and spatial features within a single feature map. Experimental results demonstrate that MFFOD achieves average detection accuracies of 88.97%, 90.32%, and 99.25% on three significant infrared scene datasets, outperforming existing target detection methods. We hope that this general detection algorithm will provide a robust reference for future infrared target detection research.
Read full abstract