Due to the high cost of equipment and the constraints of shooting conditions, obtaining aerial infrared images of specific targets is very challenging. Most methods using Generative Adversarial Networks for translating visible images to infrared greatly depend on registered data and struggle to handle the diversity and complexity of scenes in aerial infrared targets. This paper proposes a one side end-to-end unpaired aerial visible-to-infrared image translation algorithm, termed AerialIRGAN. AerialIRGAN introduces a dual-encoder structure, where one encoder is designed based on the Segment Anything Model to extract deep semantic features from visible images, and the other encoder is designed based on UniRepLKNet to capture small-scale patterns and sparse patterns from visible images. Subsequently, AerialIRGAN constructs a bridging module to deeply integrate the features of both encoders and their corresponding decoders. Finally, AerialIRGAN proposes a structural appearance consistency loss to guide the synthetic infrared images to maintain the structure of the source image while possessing distinct infrared characteristics. The experimental results show that compared to the existing typical infrared image generation algorithms, the proposed method can generate higher-quality infrared images and achieve better performance in both subjective visual description and objective metric evaluation.
Read full abstract