Abstract

AbstractStable Diffusion model has been extensively employed in the study of architectural image generation, but there is still an opportunity to enhance in terms of the controllability of the generated image content. A multi-network combined text-to-building facade image generating method is proposed in this work. We first fine-tuned the Stable Diffusion model on the CMP Facades dataset using the LoRA (Low-Rank Adaptation) approach, then we apply the ControlNet model to further control the output. Finally, we contrasted the facade generating outcomes under various architectural style text contents and control strategies. The results demonstrate that the LoRA training approach significantly decreases the possibility of fine-tuning the Stable Diffusion large model, and the addition of the ControlNet model increases the controllability of the creation of text to building facade images. This provides a foundation for subsequent studies on the generation of architectural images.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call