Accurate image segmentation is vital for the diagnosis and treatment of nasopharyngeal carcinoma (NPC). Although deep neural networks have shown promising performance in NPC segmentation, they depend on large-scale pixel-level annotation datasets for purely data-driven model training, which poses practical limitations. To address this challenge, this paper explores the domain knowledge, namely the expert knowledge from radiologists, and proposes a domain knowledge-driven encoder–decoder architecture. Specifically, the domain knowledge on image spatial information is formulated as Gaussian mixture distribution and then transformed into an optimal transport-based expert-prior regularization of the encoder, which enhances the model’s ability in capturing discriminative features. To project the encoded features onto pixel space and obtain the segmentation maps, a cross-scale feature refinement module is built in the decoder with theoretical justification, which integrates the domain knowledge that radiologists segment NPC in a gradual refinement process. Experimental results verify the effectiveness of the proposed method for NPC segmentation. Remarkably, despite using only 10% of pixel-level annotation data, the domain knowledge-driven model outperforms recent deep neural networks that use the entire training data.