Abstract

Cross-modal image generation is an important aspect of the multi-modal learning. Existing methods usually use the semantic feature to reduce the modality gap. Although these methods have achieved notable progress, there are still some limitations: (1) they usually use single modality information to learn the semantic feature; (2) they require the training data to be paired. To overcome these problems, we propose a novel semi-supervised cross-modal image generation method, which consists of two semantic networks and one image generation network. Specifically, in the semantic networks, we use image modality to assist non-image modality for semantic feature learning by using a deep mutual learning strategy. In the image generation network, we introduce an additional discriminator to reduce the image reconstruction loss. By leveraging large amounts of unpaired data, our method can be trained in a semi-supervised manner. Extensive experiments demonstrate the effectiveness of the proposed method.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.