Abstract. In 2020, Jonathan Ho et al. from the University of Berkeley proposed the Denoising Diffusion Probabilistic Models (DDPM), which improved on the shortcomings of the previous-generation Deformable Part Models (DPM). At the same time, they also surpassed previous generative models in image synthesis effects by using noise prediction, such as Generative Adversarial Networks (GAN), Variational Auto-Encoders (VAE), flow-based models and energy-based models, etc. After that, the denoising diffusion probabilistic models gradually received more discussions and research. In 2022, OpenAI launched the DALL-E 2, realizing Text-to-Image generation combined with the denoising diffusion probabilistic model. Two years later, with the release of Sora, the first Text-to-Video large models based on DALL-E, the denoising diffusion probabilistic models have received unprecedented attention in the multi-modal field. This article first provides the background and development process of the denoising diffusion probabilistic models. Secondly, it introduces the algorithmic principles of the diffusion models. Following that, it lists the applications of the models in the area of images. In the last section, it expounds on the advantages, disadvantages and future trends of the denoising diffusion probabilistic models.
Read full abstract