Image Generation Applications Research Articles

Personalizing generative models offers a way to guide image generation with user-provided references. Current personalization methods can invert an object or concept into the textual conditioning space and compose new natural sentences for text-to-image diffusion models. However, representing and editing specific visual attributes such as material, style, and layout remains a challenge, leading to a lack of disentanglement and editability. To address this problem, we propose a novel approach that leverages the step-by-step generation process of diffusion models, which generate images from low to high frequency information, providing a new perspective on representing, generating, and editing images. We develop the Prompt Spectrum Space P*, an expanded textual conditioning space, and a new image representation method called ProSpect. ProSpect represents an image as a collection of inverted textual token embeddings encoded from per-stage prompts, where each prompt corresponds to a specific generation stage (i.e., a group of consecutive steps) of the diffusion model. Experimental results demonstrate that P* and ProSpect offer better disentanglement and controllability compared to existing methods. We apply ProSpect in various personalized attribute-aware image generation applications, such as image-guided or text-driven manipulations of materials, style, and layout, achieving previously unattainable results from a single image input without fine-tuning the diffusion models. Our source code is available at https://github.com/zyxElsa/ProSpect.

Read full abstract

In the end-to-end image generation task, the spatial domain of pixel space cannot explicitly separate the low-frequency general information such as texture and color from the high-frequency detail information such as structure and identity. The loss function calculated in the spatial domain fails to effectively constrain the maintenance of detail information, and the generated image quality is insufficient. In this paper, a wavelet domain image generation (WDIG) framework is proposed to preserve the frequency information of images, in which the loss functions are constructed in the pixel space and wavelet space. In the pixel space, the low-frequency and high-frequency characteristic information of the signal are obtained by setting the appropriate Gaussian kernel and adopting the Gaussian fuzzy method. The loss function of ell_{1} norm spatial domain is constructed for the low-frequency and high-frequency characteristic information. In the wavelet space, the corresponding channel sub-band coefficients are obtained by wavelet transform, and the image is explicitly separated into high-frequency information and low-frequency information. The ell_{1} norm frequency domain loss function is constructed respectively for the sub-band coefficients. The WDIG can constrain model training more accurately and optimize model more precisely, so as to better maintain the details and quality of generated image. The WDIG framework is evaluated in the image generation applications including style transfer, image translation and Generative Adversarial Nets (GAN) Inversion. Experimental results show that the WDIG framework can effectively retain the details of images and generate more realistic images, and improve the image quality of the above applications in image generation.

Read full abstract

Image Generation Applications Research Articles

Articles published on Image Generation Applications

Tunable Privacy Risk Evaluation of Generative Adversarial Networks.

MedSegDiff-V2: Diffusion-Based Medical Image Segmentation with Transformer

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

Learning by competing: Competitive multi-generator based adversarial learning

WDIG: a wavelet domain image generation framework based on frequency domain optimization

A descriptive framework for the field of deep learning applications in medical images

Two-dimensional optical scanning for large-field-of-view optical image generation applications

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Image Generation Applications Research Articles

Articles published on Image Generation Applications

Tunable Privacy Risk Evaluation of Generative Adversarial Networks.

MedSegDiff-V2: Diffusion-Based Medical Image Segmentation with Transformer

ProSpect: Prompt Spectrum for Attribute-Aware Personalization of Diffusion Models

Learning by competing: Competitive multi-generator based adversarial learning

WDIG: a wavelet domain image generation framework based on frequency domain optimization

A descriptive framework for the field of deep learning applications in medical images

Two-dimensional optical scanning for large-field-of-view optical image generation applications

GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration