Cross-domain face translation aims to transfer face images from one domain to another. It can be widely used in practical applications, such as photos/sketches in law enforcement, photos/drawings in digital entertainment, and near-infrared (NIR)/visible (VIS) images in security access control. Restricted by limited cross-domain face image pairs, the existing methods usually yield structural deformation or identity ambiguity, which leads to poor perceptual appearance. To address this challenge, we propose a multi-view knowledge (structural knowledge and identity knowledge) ensemble framework with frequency consistency (MvKE-FC) for cross-domain face translation. Due to the structural consistency of facial components, the multi-view knowledge learned from large-scale data can be appropriately transferred to limited cross-domain image pairs and significantly improve the generative performance. To better fuse multi-view knowledge, we further design an attention-based knowledge aggregation module that integrates useful information, and we also develop a frequency-consistent (FC) loss that constrains the generated images in the frequency domain. The designed FC loss consists of a multidirection Prewitt (mPrewitt) loss for high-frequency consistency and a Gaussian blur loss for low-frequency consistency. Furthermore, our FC loss can be flexibly applied to other generative models to enhance their overall performance. Extensive experiments on multiple cross-domain face datasets demonstrate the superiority of our method over state-of-the-art methods both qualitatively and quantitatively.