AbstractThe technology for face‐to‐sketch synthesis transforms optical face images into a sketch‐style format. However, traditional style losses are insufficient to discern the modal differences between optical and sketch domain images, leading to unclear images. At the same time, generated images lack clarity due to traditional approaches' disregard for high‐frequency texture. To address these issues, a modality separation approach for facial sketch synthesis is proposed. First, a modality separation structure is proposed, using a quicksort algorithm to merge features of optical and sketch images as target modality (positive samples), ensuring the generated images' feature distribution matches real sketches. By controlling the Euclidean distance between generated images (anchors) and both target and filtered modality (positive and negative samples), irrelevant information is effectively filtered out. Next, an edge‐promoting module feeds processed blurry sketch images into the discriminator to enhance robustness. Lastly, a detail optimization module uses Laplacian filtering to extract high‐frequency texture from optical face images for local enhancement. Experimental validation on CUHK, AR, and XM2VTS datasets shows that this method outperforms mainstream sketch face synthesis methods in terms of Fréchet inception distance and learned perceptual image patch similarity, producing more realistic and natural images with richer texture details.