Denoising Diffusion Models on Model-Based Latent Space

Carmelo Scribano,Danilo Pezzi,Marco Prato,Giorgia Franchini

doi:10.3390/a16110501

Abstract

With the recent advancements in the field of diffusion generative models, it has been shown that defining the generative process in the latent space of a powerful pretrained autoencoder can offer substantial advantages. This approach, by abstracting away imperceptible image details and introducing substantial spatial compression, renders the learning of the generative process more manageable while significantly reducing computational and memory demands. In this work, we propose to replace autoencoder coding with a model-based coding scheme based on traditional lossy image compression techniques; this choice not only further diminishes computational expenses but also allows us to probe the boundaries of latent-space image generation. Our objectives culminate in the proposal of a valuable approximation for training continuous diffusion models within a discrete space, accompanied by enhancements to the generative model for categorical values. Beyond the good results obtained for the problem at hand, we believe that the proposed work holds promise for enhancing the adaptability of generative diffusion models across diverse data types beyond the realm of imagery.

Full Text