Bridging Languages through Images: A Multilingual Text-to-Image Synthesis Approach

Tarun S

doi:10.55041/ijsrem33773

Abstract

This research investigates the challenges posed by the predominant focus on English language text-to-image generation (TTI) because of the lack of annotated image caption data in other languages. The resulting inequitable access to TTI technology in non-English-speaking regions motivates the research of multilingual TTI (mTTI) and the potential of neural machine translation (NMT) to facilitate its development. The study presents two main contributions. Firstly, a systematic empirical study employing a multilingual multi-modal encoder evaluates standard cross-lingual NLP methods applied to mTTI, including TRANSLATE TRAIN, TRANSLATE TEST, and ZERO-SHOT TRANSFER. Secondly, a novel parameter-efficient approach called Ensemble Adapter (ENSAD) is introduced, leveraging multilingual text knowledge within the mTTI framework to avoid the language gap and enhance mTTI performance. Additionally, the research addresses challenges associated with transformer-based TTI models, such as slow generation and complexity for high-resolution images. It proposes hierarchical transformers and local parallel autoregressive generation techniques to overcome these limitations. A 6B-parameter transformer pretrained with a cross-modal general language model (CogLM) and fine-tuned for fast super-resolution results in a new text-to-image system, denoted as It, which demonstrates competitive performance compared to the state-of-the-art DALL-E-2. Furthermore, It supports interactive text-guided editing on images, offering a versatile and efficient solution for text-to-image generation.. Keywords: Text-to-image generation, Multilingual TTI (mTTI), Neural machine translation (NMT), Cross-lingual NLP, Ensemble Adapter (ENSAD), Hierarchical transformers, Super- resolution, Transformer-based models, Cross-modal general language model (CogLM).

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Bridging Languages through Images: A Multilingual Text-to-Image Synthesis Approach

Abstract

Talk to us

Similar Papers

More From: INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT

Lead the way for us

Journal: INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT	Publication Date: May 11, 2024
License type: mit

Similar Papers

The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
Ke Shen
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38
Ke ShenKe Shen
24 Mar 2024
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 38

Critical assessment of transformer-based AI models for German clinical notes
Sumit Madan ... Marc Jacobs
JAMIA Open | VOL. 5
Sumit Madan, et. al.Sumit Madan ... Marc Jacobs
04 Oct 2022
JAMIA Open | VOL. 5

English–Assamese neural machine translation using prior alignment and pre-trained language model
Bishwaraj Paul ... Pankaj Dadure
Computer Speech & Language | VOL. 82
Bishwaraj Paul, et. al.Bishwaraj Paul ... Pankaj Dadure
12 May 2023
Computer Speech & Language | VOL. 82

A Study of Machine Translation Models for Kannada-Tulu
Anand Kumar Madasamy ... Bharathi Raja Chakravarthi
-
Anand Kumar Madasamy, et. al.Anand Kumar Madasamy ... Bharathi Raja Chakravarthi
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Bridging Languages through Images: A Multilingual Text-to-Image Synthesis Approach

Abstract

Talk to us

Similar Papers

More From: INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT