The Comparison of the Effectiveness and Efficiency of Fine-Tuning Models on Stable Diffusion in Creating Concept Art

Abdul Bilal Qowy,Sri Hartati,Ahmad Nur Ihsan

doi:10.15408/jti.v17i1.37942

Abstract

This research aims to overcome the limitations of the Stable Diffusion model in creating conceptual works of art, focusing on problem identification, research objectives, methodology and research results. Even though Stable Diffusion has been recognized as the best model, especially in the context of creating conceptual artwork, there is still a need to simplify the process of creating concept art and find the most suitable generative model. This research used three methods: Latent Diffusion Model, Dreambooth: fine-tuning Model, and Stable Diffusion. The research results show that the Dreambooth model produces a more real and realistic painting style, while Textual Inversion tends towards a fantasy and cartoonist style. Although the effectiveness of both is relatively high, with minimal differences, the Dreambooth model is proven to be more effective based on the consistency of FID, PSNR, and visual perception scores. The Dreambooth model is more efficient in training time, even though it requires more memory, while the inference time for both is relatively similar. This research makes a significant contribution to the development of artificial intelligence in the creative industries, opens up opportunities to improve the use of generative models in creating conceptual works of art, and can potentially drive positive change in the use of artificial intelligence in the creative industries more broadly.

Full Text