Abstract

Facial expressions are important for conveying information in human interactions. The diffusion model can generate high-quality images for clearer and more discriminative faces, but its training and inference time is often prolonged, hampering practical application. Latent space diffusion models have shown promise in speeding up training by leveraging feature space parameters, but they require additional network structures. To address these limitations, we propose a contour wavelet diffusion model that accelerates both training and inference speeds. We use a contour wavelet transform to extract components from images and features, achieving substantial acceleration while preserving reconstruction quality. A normalised random channel attention module enhances the quality of generated images by focusing on high-frequency information. We also include a reconstruction loss function to enhance convergence speed. Experimental results demonstrate the effectiveness of our approach in boosting the training and inference speeds of diffusion models without sacrificing image quality. Fast generation of facial expressions can provide a smoother and more natural user experience, which is important for real-time applications. In addition, the increase in inference speed can save the use of computational resources, reduce system cost and improve energy efficiency, which is conducive to promoting the development and application of this technology.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call