Abstract

In recent years, deepfake videos have been abused to create fake news, which threaten the integrity of digital videos. Although existing detection methods leveraged cumbersome neural networks to achieve promising detection performance, they cannot be deployed in resource-constrained scenarios. To overcome this limitation, we propose a novel model compression framework based on joint distillation for deepfake detection, which includes a pre-training stage and a knowledge transfer stage. In the pre-training stage, a teacher network is trained with sufficient labeled samples. Then, in the knowledge transfer stage, a lightweight student network is constructed by considering dimension alignment. To transfer forensics knowledge comprehensively, a joint distillation loss is designed, including cross-entropy loss, knowledge distillation loss, and gradient-guided feature distillation loss. For feature distillation, feature maps from both shallow and deep layers are utilized to calculate channel-wise mean square error weighted by gradient information for transferring knowledge of forensics features adaptively. Besides, a decayed teaching strategy is constructed to adjust the importance of feature distillation, which aims at mitigating the risk of negative transfer. Extensive experiments show that student networks obtained by our model compression method can achieve competitive detection performance and outstanding efficiency by distinctly reducing computational costs compared against state-of-the-art methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call