Exposing Speech Transsplicing Forgery with Noise Level Inconsistency

Diqun Yan,Mingyu Dong,Jinxing Gao

doi:10.1155/2021/6659371

Diqun Yan, Mingyu Dong + Show 1 more

Open Access

https://doi.org/10.1155/2021/6659371

Copy DOI

Journal: Security and Communication Networks	Publication Date: Jan 27, 2021
Citations: 7	License type: CC BY 4.0

Affiliation: Ningbo University

Abstract

Splicing is one of the most common tampering techniques for speech forgery in many forensic scenarios. Some successful approaches have been presented for detecting speech splicing when the splicing segments have different signal-to-noise ratios (SNRs). However, when the SNRs between the spliced segments are close or even same, no effective detection methods have been reported yet. In this study, noise inconsistency between the original speech and the inserted segment from other speech is utilized to detect the splicing trace. First, noise signal of the suspected speech is extracted by a parameter-optimized noise estimation algorithm. Second, the statistical Mel frequency features are extracted from the estimated noise signal. Finally, the spliced region is located by utilizing a change point detection algorithm on the estimated noise signal. The effectiveness of the proposed method is evaluated on a well-designed speech splicing dataset. The comparative experimental results show that the proposed algorithm can achieve better detection performance than other algorithms.

Highlights

With the wide spread of social networks and the rapid development of powerful audio editing tools, digital speech can be accessed, manipulated, and distributed
PN􏽢 (λ, k) to obtain the enhanced speech signal 􏽢s(i). erefore, the noise signal n􏽢(i) can be estimated with the noisy speech y(i) and the enhanced speech 􏽢s(i). en, the estimated noise n􏽢(i) is framed and windowed, and for each frame, M-dimensional Mel frequency cepstral coefficient (MFCC) coefficients are calculated. e variance sequence V (V1, V2, . . . , Vn) of MFCC coefficients is obtained and taken as the input of the change point detection algorithm, and the penalty cost function is constructed by equation (11)
Splicing Dataset. e transsplicing speech samples in this study are created based on NOIZEUS speech corpus [21] which is derived from the clean speech contaminated by various kinds of noise in the real world. e clean speech comes from 30 IEEE statements containing three male and three female pronunciations. e noise signals in NOIZEUS come from the AURORA-2 database [22], including noise from train stations, airports, exhibition halls, streets, and restaurants, as well as car noise, noise from commuter trains, and babble noise from multiperson speech

Summary

Introduction

With the wide spread of social networks and the rapid development of powerful audio editing tools (such as Adobe Audition and GoldWave), digital speech can be accessed, manipulated, and distributed. En, the variances of Mel frequency cepstral coefficient (MFCC) [15] for estimated noise signal are calculated as the detecting features. The spliced region is located by a change point detection algorithm based on the penalty cost function [16]. E main work of this study is described, in which noise estimation, feature extraction, and the change point detection algorithm are described in detail. The estimated noise signal n􏽢(i) can be obtained by subtracting the enhanced speech 􏽢s(i) from the noisy speech y(i), that is, n􏽢(i) y(i) − 􏽢s(i). In the noise estimation process of each frame, the minimum pmin(λ, k) in the window is tracked, and the value obtained by the tracking is used to continuously update pmin(λ, k). It can be seen from the above analysis that reasonable adjustment

Minimum search

Experimental Results

Conclusion and Future Work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Exposing Speech Transsplicing Forgery with Noise Level Inconsistency

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security and Communication Networks

Lead the way for us

Similar Papers

Weighted Correlation-Based Spectrum Sensing for Cognitive Radio in Rayleigh Fading Channels
Xinyu Wang ... Min Jia
-
Xinyu Wang, et. al.Xinyu Wang ... Min Jia
01 Dec 2016
01 Dec 2016

Do Visual Cues Aid Comprehension of a Dialogue?
Gitte Keidser ... Sergi Rotger-Griful
The Hearing Journal | VOL. 76
Gitte Keidser, et. al.Gitte Keidser ... Sergi Rotger-Griful
23 Feb 2023
The Hearing Journal | VOL. 76

Rate-Distortion Bounds for Sparse Approximation
Alyson K Fletcher ... Vivek K Goyal
-
Alyson K Fletcher, et. al.Alyson K Fletcher ... Vivek K Goyal
01 Aug 2007
01 Aug 2007

Comment: Nonlinear SNR amplification of harmonic signal in noise
M.T Abuelma'Atti
Electronics Letters | VOL. 42
M.T Abuelma'AttiM.T Abuelma'Atti
01 Jan 2006
Electronics Letters | VOL. 42

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Exposing Speech Transsplicing Forgery with Noise Level Inconsistency

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Security and Communication Networks