Abstract

How to migrate text-to-image models based on pre-trained diffusion models to adapt them to domain generation tasks is a common problem. In particular, the generation task for Chinese landscape paintings with unique characteristics suffers from a scarcity of fine-grained contextual details specific to such artwork. Moreover, the use of substantial amounts of non-landscape painting data during pre-training predisposes the model to be swayed by alternative visual styles, thereby leading to generated images that inadvertently lack the distinctive traits inherent to Chinese paintings. In this paper, we propose a Fine-grained Hierarchical Semantic Adapter for Chinese landscape paintings generation, namely FHS-adapter. The method orchestrates the diffusion process in a batch-wise manner, leveraging external fine-grained multi-perspective information to guide it. It gradually diminishes the influence of other style images embedded in the pre-trained diffusion model, ultimately preserving a greater number of landscape painting elements. The encoder was also replaced with the Taiyi-CLIP encoder, which is adapted for Chinese. We propose T2ICLP, a multimodal dataset containing 10,000 high-quality image-text pairs of Chinese landscape paintings. Unlike previous datasets, this dataset extracts fine-grained textual information from four perspectives, including Meta, Description, Sentiment, Poem. We compared the proposed model with the mainstream diffusion-based T2I models. Through an anonymous user study, our FHS-adapter method performs well in simulating various aspects such as brushwork, e.g.‘Gou, Cun, Dian, Ran’ means hooking, texturing, dotting, and dyeing, compositional space, elemental proportions, and color usage of different painting genres and artists. Our dataset is available at https://github.com/T2ICLP/t2iclp.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.