Infrared dim and small target detection (IRDSTD) aims to obtain target position information from the background, clutter, and noise. However, for infrared dim and small targets with low signal-to-clutter ratios (SCRs), the detection difficulty lies in the fact that their poor local spatial saliency will lead to missed detections and false alarms. In this work, a spatiotemporally integrated detection network (STIDNet) is proposed for IRDSTD. In the network, a spatial saliency feature generation module (SSFGM) employs a U-shaped network to extract deep features from the spatial dimension of the input image in a frame-by-frame manner and splices them based on the temporal dimension to obtain an airtime feature tensor. IRDSTs with direction-of-motion consistency and strong interframe correlation are reinforced, and randomly generated spurious waves, noise, and other false alarms are inhibited via a fixed-weight multiscale motion feature-based 3D convolution kernel (FWMFCK-3D). A mapping from the features to the target probability likelihood map is constructed in a spatiotemporal feature fusion module (STFFM) by performing 3D convolutional fusion on the spatially localized saliency and time-domain motion features. Finally, several ablation and comparison experiments indicate the excellent performance of the proposed network. For infrared dim and small targets with SCRs < 3, the average AUC value still reached 0.99786.
Read full abstract