After several years of development, deep synthesis technology has made significant progress in image and video synthesis. Deep forgery represented by Deepfakes has become a research hotspot, which is used as a tool for disinformation attacks. The current strongly discriminative models can have good performance on specific datasets, even close to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$100\%$</tex-math></inline-formula> accuracy. Unfortunately, since a specific discriminative method only fits a specific data distribution, and different forgery methods or datasets have different data distributions. These methods fail to achieve high performance in cross-dataset detection. In response to this problem and focusing on the actual situation, we adjust the strong generalization detection across the dataset to the generalization detection of unseen fake video. We propose Multi-Crise-Cross Attention and StyleGANv2 Generative Adversarial Network (MCS-GAN). Firstly, we built a Generative Adversarial Network (GAN) framework to learn the distribution of real face data and generate corresponding face images. Secondly, to break the high stitch between the fake region and the background, the model needs to have strong enough feature analysis and pixel restoration capabilities. Therefore, we propose a generator consisting of a Multi-Crise-Cross-Attention (MC) encoder and a StyleGANv2 (SG2) decoder. Finally, to avoid the situation where as long as a face is normal or different faces are abnormal, we set a latent space encoding discriminator and increase the ratio of latent space vector, so as to detect anomaly generated by the forgery operation acting on latent space. We conduct some model generalization experiments on videos on the Internet and some popular deepfake databases. The results show that the accuracy of our method is better compared with the best methods.