Abstract

Recently, text-to-Image (T2I) generation has been well developed by improving synthesis authenticity, text-consistency and generation diversity. However, large amount of pairwise image–text data required restricts generalization of synthesis models only to its pre-trained language. In this paper, a cross-lingual pre-training method is proposed to adapt target low-resource language to pre-trained generative models. As far as we known, this is the first time that arbitrary input languages could access T2I generation. This joint encoding scheme fulfills both universal and visual semantic alignment. With any prepared GAN-based T2I framework, pre-trained source encoder model could be easily fine-tuned to construct target encoder model and hence entirely enable transfer of T2I synthesis ability between languages. After that, a semantic-level alignment independent of source T2I structure is established to guarantee optimal text consistency and detail generation. Different from monolingual T2I methods that apply discriminator to enhance generation quality, we use an adversarial training scheme that optimizes the sentence-level alignment along with the word-level alignment with a self-attention mechanism. Considering of training for low-resource languages lack of parallel texts in practice, target input embedding is designed available for zero-shot learning. Experimental results prove robustness of the proposed cross-lingual T2I pre-training on multiple downstream generative models and target languages applied.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.