Complementary regional energy features for spoofed speech detection

Gökay Dişken

doi:10.1016/j.csl.2023.101602

Abstract

Automatic speaker verification systems are found to be vulnerable to spoof attacks such as voice conversion, text-to-speech, and replayed speech. As the security of biometric systems is vital, many countermeasures have been developed for spoofed speech detection. To satisfy the recent developments on speech synthesis, publicly available datasets became more and more challenging (e.g., ASVspoof 2019 and 2021 datasets). A variety of replay attack configurations were also considered in those datasets, as they do not require expertise, hence easily performed. This work utilizes regional energy features, which are experimentally proven to be more effective than the traditional frame-based energy features. The proposed energy features are independent from the utterance length and are extracted over nonoverlapping time-frequency regions of the magnitude spectrum. Different configurations are considered in the experiments to verify the regional energy features’ contribution to the performance. First, light convolutional neural network – long short-term memory (LCNN – LSTM) model with linear frequency cepstral coefficients is used to determine the optimal number of regional energy features. Then, SE-Res2Net model with log power spectrogram features is used, which achieved comparable results to the state-of-the-art for ASVspoof 2019 logical access condition. Physical access condition from ASVspoof 2019 dataset, logical access and deep fake conditions from ASVspoof 2021 dataset are also used in the experiments. The regional energy features achieved improvements for all conditions with almost no additional computational or memory loads (less than 1% increase in the model size for SE-Res2Net). The main advantages of the regional energy features can be summarized as i) capturing nonspeech segments, ii) extracting band-limited information. Both aspects are found to be discriminative for spoofed speech detection.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Complementary regional energy features for spoofed speech detection

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Dec 16, 2023
Citations: 4

Similar Papers

A robust voice spoofing detection system using novel CLS-LBP features and LSTM
Hussain Dawood ... Ali Javed
Journal of King Saud University - Computer and Information Sciences | VOL. 34
Hussain Dawood, et. al.Hussain Dawood ... Ali Javed
22 Mar 2022
Journal of King Saud University - Computer and Information Sciences | VOL. 34

Voice Spoofing Countermeasure for Logical Access Attacks Detection
Tuba Arif ... Mohammed Alhameed
IEEE Access | VOL. 9
Tuba Arif, et. al.Tuba Arif ... Mohammed Alhameed
01 Jan 2020
IEEE Access | VOL. 9

Static\u2013dynamic features and hybrid deep learning models based spoof detection system for ASV
Aakshi Mittal ... Mohit Dua
Complex & Intelligent Systems | VOL. 8
Aakshi Mittal, et. al.Aakshi Mittal ... Mohit Dua
19 Nov 2021
Static\u2013dynamic features and hybrid deep learning models based spoof detection system for ASV
Aakshi Mittal ... Mohit Dua

Voice Spoofing Countermeasure for Synthetic Speech Detection
Farman Hassan ... Ali Javed
-
Farman Hassan, et. al.Farman Hassan ... Ali Javed
05 Apr 2021
05 Apr 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Complementary regional energy features for spoofed speech detection

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language