Face forgery by DeepFake has caused widespread concern in community because of the synthesized media’s risks to the society. However, advances in recent years have been able to produce synthetic images indistinguishable from real images in the RGB space. Extracting midfrequency facial geometry details, including person-specific details and dynamic expression-dependent ones on facial geometry surfaces, is a promising way to highlight forgery clues during face forgery detection. In this paper, we use 3D face reconstruction to generate the displacement map from a single input face image, which is able to represent middle and fine scale details by indicating signed distance from the point in UV space. The cropped face images can also provide eyes and mouse information, so we use face image and its displacement map to extract the image features. Besides, we save the computation cost and maintain competitive performance using a universal transformer architecture and introduce a manifold distillation strategy to train our model from a more complex transformer backbone. Extensive experiments on various public DeepFake datasets indicate the effectiveness of the extracted facial geometry details, and proposed method achieves competitive performance.