Unsupervised image-to-image translation aims to learn a domain mapping function that preserves the semantics of an input image while adapting its style to target domains without paired data. However, if there is a large semantic mismatch between the source and target domains, current methods often suffer from semantics distortion. Based on dense self-supervised representation learning, a novel Multi-Scale Semantic Consistency Regularization (MSSCR) is presented to alleviate the semantic distortion and enable the generator to produce images with realistic local semantics and consistent structures. Both local and global multi-scale representations are learned by the MSSCR during training the different layers of a discriminator. Concretely, MSSCR operates by sliding a fixed-size window over the overlapping region between a pair of views cropped from a single real image, aligning these areas with their corresponding multi-scale representation regions extracted from the discriminator, and then maximizing the similarity of representations between positive pairs. Qualitative and quantitative experiments demonstrate the superiority of MSSCR on image-to-image translation and image generation tasks.