Automatically translating chromaticity-free thermal infrared (TIR) images into realistic color visible (CV) images is of great significance for autonomous vehicles, emergency rescue, robot navigation, nighttime video surveillance, and many other fields. Most recent designs use end-to-end neural networks to translate TIR directly to CV; however, compared to these networks, TIR has low contrast and an unclear texture for CV translation. Thus, directly translating the TIR temperature value of only one channel to the RGB color value of three channels without adding additional constraints or semantic information does not handle the one-to-three mapping problem between different domains in a good way, causing the translated CV images not only to have blurred edges but also color confusion. As for the methodology of the work, considering that in the translation from TIR to CV the most important process is to map information from the temperature domain into the color domain, an improved CycleGAN (GMA-CycleGAN) is proposed in this work in order to translate TIR images to grayscale visible (GV) images. Although the two domains have different properties, the numerical mapping is one-to-one, which reduces the color confusion caused by one-to-three mapping when translating TIR to CV. Then, a GV-CV translation network is applied to obtain CV images. Since the process of decomposing GV images into CV images is carried out in the same domain, edge blurring can be avoided. To enhance the boundary gradient between the object (pedestrian and vehicle) and the background, a mask attention module based on the TIR temperature mask and the CV semantic mask is designed without increasing the network parameters, and it is added to the feature encoding and decoding convolution layers of the CycleGAN generator. Moreover, a perceptual loss term is applied to the original CycleGAN loss function to bring the translated images closer to the real images regarding the space feature. In order to verify the effectiveness of the proposed method, the FLIR dataset is used for experiments, and the obtained results show that, compared to the state-of-the-art model, the subjective quality of the translated CV images obtained by the proposed method is better, as the objective evaluation metric FID (Fréchet inception distance) is reduced by 2.42 and the PSNR (peak signal-to-noise ratio) is improved by 1.43.
Read full abstract