How to migrate text-to-image models based on pre-trained diffusion models to adapt them to domain generation tasks is a common problem. In particular, the generation task for Chinese landscape paintings with unique characteristics suffers from a scarcity of fine-grained contextual details specific to such artwork. Moreover, the use of substantial amounts of non-landscape painting data during pre-training predisposes the model to be swayed by alternative visual styles, thereby leading to generated images that inadvertently lack the distinctive traits inherent to Chinese paintings. In this paper, we propose a Fine-grained Hierarchical Semantic Adapter for Chinese landscape paintings generation, namely FHS-adapter. The method orchestrates the diffusion process in a batch-wise manner, leveraging external fine-grained multi-perspective information to guide it. It gradually diminishes the influence of other style images embedded in the pre-trained diffusion model, ultimately preserving a greater number of landscape painting elements. The encoder was also replaced with the Taiyi-CLIP encoder, which is adapted for Chinese. We propose T2ICLP, a multimodal dataset containing 10,000 high-quality image-text pairs of Chinese landscape paintings. Unlike previous datasets, this dataset extracts fine-grained textual information from four perspectives, including Meta, Description, Sentiment, Poem. We compared the proposed model with the mainstream diffusion-based T2I models. Through an anonymous user study, our FHS-adapter method performs well in simulating various aspects such as brushwork, e.g.‘Gou, Cun, Dian, Ran’ means hooking, texturing, dotting, and dyeing, compositional space, elemental proportions, and color usage of different painting genres and artists. Our dataset is available at https://github.com/T2ICLP/t2iclp.