Two-Stream Xception Structure Based on Feature Fusion for DeepFake Detection

Bin Wang,Liqing Huang,Tianqiang Huang,Feng Ye

doi:10.1007/s44196-023-00312-8

Abstract

DeepFake may have a crucial impact on people’s lives and reduce the trust in digital media, so DeepFake detection methods have developed rapidly. Most existing detection methods rely on single-space features (mostly RGB features), and there is still relatively little research on multi-space feature fusion. At the same time, a lot of existing methods used a single receptive field, which leads to models that cannot extract information of different scales. In order to solve the above problems, we propose a two-stream Xception network structure (Tception) that fused RGB spatial feature and noise-space feature. This network structure consists of two main parts. The first part is a feature fusion module, which can adaptively fuse RGB feature and noise-space feature generated by RGB images through SRM filters. The second part is the two-stream network structure, which utilizes a parallel structure of convolutional kernels of different sizes allowing the network to learn features of different scales. The experiments show that the proposed method improves performance compared to the Xception network. Compared to SSTNet, the detection accuracy of the Neural Textures is improved by nearly 8%.

Full Text