Abstract
Recent advancements in conditional generative models, e.g., Conditional Variational AutoEncoder (CVAE) and Conditional Denoising Diffusion Probabilistic Model (CDDPM), which utilize class-specific embeddings for generating images in specific classes, have demonstrated exceptional capabilities in producing high-quality images on balanced datasets. However, their performance decreases with imbalanced real-world data, affecting the fidelity and diversity of the generated images. This paper discusses the reasons and solutions for the imbalance issue in CVAE and CDDPM. By selectively reweighting the gradients of the embedding layer and main network, we identify the embedding layer’s bias as a critical bottleneck in these models. Motivated by this, we propose Embedding Pretraining and Regularization (PREmbed), a training methodology for addressing biased learning in the embedding layer of these models. PREmbed consists of two key components: (i) Masked Autoencoder pretraining on the original imbalanced dataset to obtain a robust initial embedding, and (ii) an embedding regularization loss to preserve class-level distances during the training phase to improve imbalanced embedding learning. Implementing PREmbed leads to the enhanced convergence of a robust embedding layer. Our experiments on imbalanced CIFAR-10 and CUB-200 datasets demonstrate that PREmbed enhances the performance of CVAE and CDDPM; it outperforms the baseline CVAE by 22.3% (FID 18.63) and the CDDPM by 30.3% (FID 12.70) in imbalanced CIFAR-10; 33.7% (FID 24.08) and 26.5% (FID 13.90) in CUB-200; and 16.4% (FID 49.41) and 15.2% (FID 33.05) in ImageNet-LT (which is where it achieves the state-of-the-art performance in imbalanced conditional image generation task).
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have