Grayscale image colorization, especially for ethnic costume images, is highly challenging due to its rich and complex color features. The existing image colorization methods usually take the costume image as a whole in practical applications that lead to the ignorance of the semantic information of different parts of the costume. It is known that each part's color distribution of the ethnic costume is different. So, the color mapping of other parts is also diverse, which is determined by distinctive ethnic characteristics. This study introduces fine-grained level semantic information and proposes a high-resolution image colorization model for ethnic costumes targeting enhancement, inspired by semantic-level colorization. The semantic information of different regions of ethnic costumes has a significant impact on the performance of the coloring task. Using Pix2PixHD as the backbone network, we create a new network architecture that maintains color distribution correspondence and spatial consistency of costume images using fine-grained semantic information. In our network, we take the splice result of fine-grained semantic for ethnic costume and grayscale image as the conditions and then feed them into the generative adversarial networks. We also discuss and analyze the influences of the grayscale channel and fine-grained semantics on discriminator. Extensive experiments demonstrate that our method performs well compared with other state-of-the-art automatic colorization methods.