Abstract
Recently, with the continuous advancement of deep learning techniques, research on sketch synthesis has been progressing. However, existing methods still face challenges in generating human-like freehand sketches from real-world natural images at both object and scene levels. To address this, we propose SketchDiffusion, a text-guided freehand sketch synthesis method based on conditional stable diffusion. In SketchDiffusion, we design a novel image enhancing module to efficiently extract high-quality image features. Moreover, we utilize additional guidance from global and local features extracted by a U-shaped diffusion guidance network to control the noise addition and denoising process of the diffusion model, thereby significantly improving controllability and performance in freehand sketch synthesis. Beyond the model architecture, we leverage the designed BLIP-based text generation method to create 70,280 text prompts for foreground, background, and panorama sketch synthesis in the extensive SketchyCOCO dataset, thereby improving the overall effectiveness of model training. Compared to the state-of-the-art methods, our proposed SketchDiffusion has shown an average improvement of over 16.4%, 16.75%, and 12.8% on three quantitative metrics (sketch recognition, sketch-based retrieval, and user perceptual study), respectively. Furthermore, our approach not only excels in synthesizing freehand sketches containing multiple abstract objects but also has multiple applications in supporting human–computer interaction.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have