Synthetic speech detection through short-term and long-term prediction traces

Clara Borrelli,Stefano Tubaro,Paolo Bestagini,Augusto Sarti,Fabio Antonacci

doi:10.1186/s13635-021-00116-3

Clara Borrelli, Stefano Tubaro + Show 3 more

Open Access

https://doi.org/10.1186/s13635-021-00116-3

Copy DOI

Journal: EURASIP Journal on Information Security	Publication Date: Apr 6, 2021
Citations: 38	License type: open-access

Affiliation: Politecnico di Milano

Abstract

Several methods for synthetic audio speech generation have been developed in the literature through the years. With the great technological advances brought by deep learning, many novel synthetic speech techniques achieving incredible realistic results have been recently proposed. As these methods generate convincing fake human voices, they can be used in a malicious way to negatively impact on today’s society (e.g., people impersonation, fake news spreading, opinion formation). For this reason, the ability of detecting whether a speech recording is synthetic or pristine is becoming an urgent necessity. In this work, we develop a synthetic speech detector. This takes as input an audio recording, extracts a series of hand-crafted features motivated by the speech-processing literature, and classify them in either closed-set or open-set. The proposed detector is validated on a publicly available dataset consisting of 17 synthetic speech generation algorithms ranging from old fashioned vocoders to modern deep learning solutions. Results show that the proposed method outperforms recently proposed detectors in the forensics literature.

Highlights

The possibility of manipulating digital multimedia objects is within everyone’s reach
2 Background we provide the reader with some background on state-of-the-art algorithms for synthetic speech generation and synthetic speech detection
Text-to-speech (TTS) synthesis was largely based on concatenative waveform synthesis, i.e., given a text as input, the output audio is produced by selecting the correct diphone units from a large dataset of diphone waveforms and concatenating them so that intelligibility is ensured [18,19,20]

Summary

Introduction

The possibility of manipulating digital multimedia objects is within everyone’s reach. The huge technological advances determined by deep learning has delivered a series of artificial intelligence (AI)-driven tools that make manipulations extremely realistic and convincing. All of these tools are surely a great asset in a digital artist’s arsenal. Deepfake AI-driven technology enables replacing one person’s identity with someone else in a video [3] This has been used to disseminate fake news through politician impersonation as well as for revenge porn distribution. 2.1 Fake speech generation Synthetic speech generation is a problem that has been studied for many years and addressed with several approaches For this reason, in the literature a large number of techniques that achieve good results are present and there is not a single unique way of generating a synthetic speech track. The main drawback of concatenative synthesis is the difficulty of modifying the voice timbral characteristics, e.g., to change speaker or embed emotional content in the voice

Objectives

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Synthetic speech detection through short-term and long-term prediction traces

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Information Security

Lead the way for us

Similar Papers

Synthetic Speech Detection Using Neural Networks
Ricardo Reimao ... Vassilios Tzerpos
-
Ricardo Reimao, et. al.Ricardo Reimao ... Vassilios Tzerpos
13 Oct 2021
13 Oct 2021

SpoTNet: A spoofing-aware Transformer Network for Effective Synthetic Speech Detection
Awais Khan ... Khalid Mahmood Malik
-
Awais Khan, et. al.Awais Khan ... Khalid Mahmood Malik
12 Jun 2023
12 Jun 2023

ASSD: Synthetic Speech Detection in the AAC Compressed Domain
Amit Kumar Singh Yadav ... Ziyue Xiang
-
Amit Kumar Singh Yadav, et. al.Amit Kumar Singh Yadav ... Ziyue Xiang
04 Jun 2023
04 Jun 2023

Light Convolutional Neural Network with Feature Genuinization for Detection of Synthetic Speech Attacks
Zhenzong Wu ... Rohan Kumar Das
-
Zhenzong Wu, et. al.Zhenzong Wu ... Rohan Kumar Das
25 Oct 2020
25 Oct 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Synthetic speech detection through short-term and long-term prediction traces

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Information Security