Cross-View Image Translation Based on Local and Global Information Guidance

Yan Shen,Meng Luo,Xiaoli Hao,Xiaotao Shao,Zhongli Wang,Yun Chen,Ya-Li Hou

doi:10.1109/access.2021.3052241

Abstract

The cross-view image translation task is aimed at generating scene images from arbitrary views. However, due to the great differences in the shapes and contents of the various views, the quality of the generated images is degraded. Small objects, such as vehicles’ shapes and details, are not clear, which causes them to be structurally inconsistent with the semantic map used to guide the generation process. To solve this problem, we propose a novel generative adversarial network based on a local and global information processing module (LAGGAN) to recover the image’s details and structures. The network will further combine the input viewpoint image and the target semantic segmentation map to guide the generation of the target image from another viewpoint. The proposed LAGGAN includes a two-stage generator and a parameter-sharing discriminator. LAGGAN uses a new local and global information processing module (LAG) to generate high-quality images from various views. Moreover, we integrate dilated convolutions into the discriminator to capture the global context, which can enhance the discriminative ability and further adjust the LAG module. Therefore, most semantic information can be preserved, and the details of the target viewpoint images can be translated more sharply. Quantitative and qualitative evaluation on both CVUSA and Dayton datasets attest to the fact that our method, LAGGAN, presents satisfactory perceptual results and is comparable to state-of-the-art methods on the cross-view image translation task.

Highlights

Image-to-image translation is a task aimed at mapping an image in one domain to another domain
Image-to-image translation based on convolutional neural networks (CNNs) [12] needs to establish various structures and objective functions for different image-to-image translation tasks
To address the above problems, we propose a novel generative adversarial network for the cross-view image translation task, called local and global information guidance (LAGGAN)

Summary

Introduction

Image-to-image translation is a task aimed at mapping an image in one domain to another domain. Multiple image processing tasks can be attributed to this, such as singleimage super resolution [1]–[3], semantic image synthesis [4], [5], and image restoration (denoise [6], [7], derain [8], [9], dehaze [10], etc.). This technology has great application prospects and has attracted more researchers in recent years.

Methods

Results

Conclusion