Audio Stream Analysis for Deep Fake Threat Identification

Karol Jędrasiak

doi:10.31648/cetl.9684

Abstract

This article introduces a novel approach for the identification of deep fake threats within audio streams, specifically targeting the detection of synthetic speech generated by text-to-speech (TTS) algorithms. At the heart of this system are two critical components: the Vocal Emotion Analysis (VEA) Network, which captures the emotional nuances expressed within speech, and the Supervised Classifier for Deepfake Detection, which utilizes the emotional features extracted by the VEA to distinguish between authentic and fabricated audio tracks. The system capitalizes on the nuanced deficit of deepfake algorithms in replicating the emotional complexity inherent in human speech, thus providing a semantic layer of analysis that enhances the detection process. The robustness of the proposed methodology has been rigorously evaluated across a variety of datasets, ensuring its efficacy is not confined to controlled conditions but extends to realistic and challenging environments. This was achieved through the use of data augmentation techniques, including the introduction of additive white noise, which serves to mimic the variabilities encountered in real-world audio processing. The results have shown that the system's performance is not only consistent across different datasets but also maintains high accuracy in the presence of background noise, particularly when trained with noise-augmented datasets. By leveraging emotional content as a distinctive feature and applying sophisticated machine learning techniques, it presents a robust framework for safeguarding against the manipulation of audio content. This methodological contribution is poised to enhance the integrity of digital communications in an era where synthetic media is proliferating at an unprecedented rate.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Audio Stream Analysis for Deep Fake Threat Identification

Abstract

Talk to us

Similar Papers

More From: Civitas et Lex

Lead the way for us

Journal: Civitas et Lex	Publication Date: Apr 2, 2024
License type: CC BY-NC-ND 4.0

Similar Papers

Auditory Brainstem Responses Predict Behavioral Deficits in Rats with Varying Levels of Noise-Induced Hearing Loss
Jonathan R Riley ... Crystal T Engineer
Neuroscience | VOL. 477
Jonathan R Riley, et. al.Jonathan R Riley ... Crystal T Engineer
08 Oct 2021
Neuroscience | VOL. 477

Outer Hair Cell Damage: A Completely Different Listening Experience
Richard Hoben ... Mark A Parker
The Hearing Journal | VOL. 69
Richard Hoben, et. al.Richard Hoben ... Mark A Parker
01 Jun 2016
The Hearing Journal | VOL. 69

Robust voice activity detection using feature combination
Sahar Khaksar Haghani ... Seyed Mohammad Ahadi
-
Sahar Khaksar Haghani, et. al.Sahar Khaksar Haghani ... Seyed Mohammad Ahadi
01 May 2013
01 May 2013

Real-Time Voice Activity Detection Using Neck-Mounted Accelerometers for Controlling a Wearable Vibration Device to Treat Speech Impairment
Saurav Dubey ... Jürgen Konczak
-
Saurav Dubey, et. al.Saurav Dubey ... Jürgen Konczak
06 Apr 2020
06 Apr 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Audio Stream Analysis for Deep Fake Threat Identification

Abstract

Talk to us

Similar Papers

More From: Civitas et Lex