Abstract

Recent advancements in generative AI technology have created more realistic fake data that are utilized in various fields, such as data augmentation. However, the misuse of deepfake technology has led to increased damage. Consequently, ongoing research aims to analyze modality characteristics and detect deepfakes through AI-based methods. Existing AI-based deepfake-detection techniques have limitations in detecting deepfakes in modalities and identities that are not included in the training data. This study proposes a baseline approach based on zero-shot identity and one-shot deepfake detection for detecting deepfakes in environments with limited data. Additionally, we propose a triple-modality interaction based on a multimodal transformer (TMI-Former) to consider the triple-modality aspects of deepfakes. TMI-Former comprises four stages: vision feature extraction, representation, residual connection, and late-level fusion. It operates in a two-stage manner, extracting visual features and reconstructing them using auditory and linguistic features, thereby allowing for triple-modality interactions. In environments with limited data, such as zero-shot identity and one-shot deepfake scenarios, TMI-Former demonstrated effectiveness, with an accuracy ranging from 18.75% to 19.5% and an f1-score ranging from 0.2238 to 0.3561, compared to unimodal AI. Furthermore, TMI-Former shows superior performance compared to the existing multi-modal AI, with an accuracy ranging from 1.44% to 19.75% and an f1-score ranging from 0.0146 to 0.4169.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.