Abstract
Infrared target detection has important applications in rescue and Earth observation. However, the disadvantages of low signal-to-clutter ratios and severe background noise interference for infrared imaging pose great challenges to the detection technology for infrared dim targets. Most algorithms only extract features from the spatial domain, while the lack of temporal information results in unsatisfactory detection performance when the difference between the target and the background is not significant enough. Although some methods utilize temporal information in the detection process, these nonlearning-based methods fail to incorporate the complex and changeable background, and need to adjust parameters according to the input. To tackle this problem, we proposed a Spatio-Temporal Differential Multiscale Attention Network (STDMANet), a learning-based method for multiframe infrared small target detection in this article. Our STDMANet first used the temporal multiscale feature extractor to obtain spatiotemporal (ST) features from multiple time scales and then resorted them to the spatial multiscale feature refiner to enhance the semantics of ST features on the premise of maintaining the position information of small targets. Finally, unlike other learning-based networks that require binary masks for training, we designed a mask-weighted heatmap loss to train the network with only center point annotations. At the same time, the proposed loss can balance missing detection and false alarm, so as to achieve a good balance between finding the targets and suppressing the background. Extensive quantitative experiments on public datasets validated that the proposed STDMANet could improve the metric <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">${F_{1}}$ </tex-math></inline-formula> score up to 0.9744, surpassing the state-of-the-art baseline by 0.1682. Qualitative experiments show the proposed method could stably extract foreground moving targets from video sequences with various backgrounds while reducing false alarm rate better than other recent baseline methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: IEEE Transactions on Geoscience and Remote Sensing
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.