Abstract It is challenging to detect infrared dim targets submerged in complicated backgrounds due to their small size and faint intensity. The previous attention-based detection networks frequently require global long-range dependence. Significant calculations are required to determine the target's sparse but meaningful position. To prevent wasting calculations on the background, this paper offers a detection network guided by global context for local feature learning, named GILNet (Global Induced Local Network). It designs a global location module (GLM) and a local feature interaction module (LFIM) to capture the global position and features of targets, respectively. More specifically, using global context interaction, the GLM finds the region that might contain dim small targets, that is, the coarse location. In the coarsely located regions, the LFIM further acquires feature information about targets. Next, we also design an eight-directional attention operation to obtain the contour information of targets in the low feature map. It is fused with the high feature map in the multi-directional feature fusion module (MFFM), which retains more semantic and spatial information about targets. Finally, quantitative and qualitative analysis show that the GILNet performs better than eight comparison methods on two public datasets.
Read full abstract