Memory-Based Learning and Fusion Attention for Few-Shot Food Image Generation Method

Jinlin Ma,Yuetong Wan,Ziping Ma

doi:10.3390/app14188347

Jinlin Ma, Yuetong Wan + Show 1 more

Open Access

https://doi.org/10.3390/app14188347

Copy DOI

Export

Save

Cite

Journal: Applied Sciences	Publication Date: Sep 17, 2024
License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Generating food images aims to convert textual food ingredients into corresponding images for the visualization of color and shape adjustments, dietary guidance, and the creation of new dishes. It has a wide range of applications, including food recommendation, recipe development, and health management. However, existing food image generation models, predominantly based on GANs (Generative Adversarial Networks), face challenges in maintaining semantic consistency between image and text, as well as achieving visual realism in the generated images. These limitations are attributed to the constrained representational capacity of sparse ingredient embedding and the lack of diversity in GAN-based food image generation models. To alleviate this problem, this paper proposes a food image generation network, named MLA-Diff, in which ingredient and image features are learned and integrated as ingredient-image pairs to generate initial images, and then image details are refined by using an attention fusion module. The main contributions are as follows: (1) The enhanced CLIP (Contrastive Language-Image Pre-Training) module is constructed by transforming sparse ingredient embedding into compact embedding and capturing multi-scale image features, providing an effective solution to alleviate semantic consistency issues. (2) The Memory module is proposed by embedding a pre-trained diffusion model to generate initial images with diversity and reality. (3) The attention fusion module is proposed by integrating features from diverse modalities to enhance the comprehension between ingredient and image features. Extensive experiments on the Mini-food dataset demonstrate the superiority of the MLA-Diff in terms of semantic consistency and visual realism, generating high-quality food images.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Memory-Based Learning and Fusion Attention for Few-Shot Food Image Generation Method

Abstract

Published Version

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Text to Video GANs:TFGAN, IRC-GAN, BoGAN
Rayeesa Mehmood ... Kaiser J Giri
-
Rayeesa Mehmood, et. al.Rayeesa Mehmood ... Kaiser J Giri
25 Mar 2022
25 Mar 2022

SINGLE VIEW RECONSTRUCTION FOR FOOD PORTION ESTIMATION

-

10 Jun 2019
10 Jun 2019

MirrorGAN: Learning Text-To-Image Generation by Redescription
Tingting Qiao ... Dacheng Tao
-
Tingting Qiao, et. al.Tingting Qiao ... Dacheng Tao
01 Jun 2019
01 Jun 2019

ChefGAN
Siyuan Pan ... Xuhong Hou
-
Siyuan Pan, et. al.Siyuan Pan ... Xuhong Hou
12 Oct 2020
12 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Memory-Based Learning and Fusion Attention for Few-Shot Food Image Generation Method

Abstract

Published Version

Talk to us

Similar Papers

More From: Applied Sciences