Abstract

There has been significant success in recent image-to-image translation (I2I) approaches in translating the source image into the style of the target image. Existing techniques rely on the disentanglement of content and style representations, requiring a two-stage style mapping process: Reference images are used to extract style vectors, which are subsequently remapped into the translated images. However, when the target domain contains a variety of styles, such a two-stage style mapping cannot guarantee the translated image be style consistent with its guided reference image. In this work, we propose to explicitly employ metric learning to enhance the two-stage style mapping in style-guided image translation. The distance between deep features Gram matrices is utilized to construct the visual style metric as self-supervised similarity labels, guiding the embedding of style vectors using triplet loss with adaptive margins in the first stage. Furthermore, in the second stage, we consider generated images and their corresponding reference images as positive samples and anchors for each other, while the nearest negative sample is used to construct the triplet loss in the proposed metric space. The proposed learning algorithms can be applied to any I2I framework that uses disentangled representations without modifying the original network architectures. We evaluate the proposed method on three representative I2I translation baselines. Both qualitative and quantitative results demonstrate that the proposed approach enhances style alignment in style-guided translation compared to the baselines.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call