Constant Q Cepstral Coefficients Research Articles

In this paper, we propose the combination of Amplitude Modulation and Frequency Modulation (AM-FM) features for replay Spoof Speech Detection (SSD) task. The AM components are known to be affected by noise (in this case, due to replay mechanism). In particular, we exploit this damage in AM component to corresponding Instantaneous Frequency (IF) for SSD task. Thus, the novelty of proposed Amplitude Weighted Frequency Cepstral Coefficients (AWFCC) feature set lies in using frequency components along with squared weighted amplitude components that are degraded due to replay noise. The AWFCC feature set contains the information of both AM and FM components together and hence, gave discriminatory information in the spectral characteristics. The experiments were performed on publicly available ASVspoof 2017 challenge version 1.0 and 2.0 databases using AWFCC feature set. We have compared results of proposed feature set with the other state-of-the-art feature set, such as Constant Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), Mel Frequency Cepstral Coefficients (MFCC) and using a simple Gaussian Mixture Model (GMM) classifier. The individual performance of AWFCC feature set obtained lower % EER than the other feature sets on both version 1.0 and 2.0 databases. Furthermore, we used score-level fusion in order to obtain the possible complementary information of two feature sets to reduce the % EER further. To that effect, the score-level fusion of CQCC and AWFCC feature sets gave 5.75 % and 10.42 % EER on development and evaluation sets, respectively, of ASVspoof 2017 version 2.0 database. Moreover, for evaluation dataset, we have also studied the performance of proposed feature set on different Replay Configurations (RC), namely, acoustic environments, playback, and recording devices. For all the levels of threat conditions (i.e., low, medium, and high) to the ASV system, the proposed feature set performed better compared to the existing state-of-the-art feature sets.

Read full abstract

Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount — yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a human-spoof discriminator. In this paper, we advocate a different, deep generative approach that leverages from powerful unsupervised manifold learning in classification. The potential benefits include the possibility to sample new data, and to obtain insights to the latent features of genuine and spoofed speech. To this end, we propose to use variational autoencoders (VAEs) as an alternative backend for replay attack detection, via three alternative models that differ in their class-conditioning. The first one, similar to the use of Gaussian mixture models (GMMs) in spoof detection, is to train independently two VAEs — one for each class. The second one is to train a single conditional model (C-VAE) by injecting a one-hot class label vector to the encoder and decoder networks. Our final proposal integrates an auxiliary classifier to guide the learning of the latent space. Our experimental results using constant-Q cepstral coefficient (CQCC) features on the ASVspoof 2017 and 2019 physical access subtask datasets indicate that the C-VAE offers substantial improvement in comparison to training two separate VAEs for each class. On the 2019 dataset, the C-VAE outperforms the VAE and the baseline GMM by an absolute 9 - 10% in both equal error rate (EER) and tandem detection cost function (t-DCF) metrics. Finally, we propose VAE residuals — the absolute difference of the original input and the reconstruction as features for spoofing detection. The proposed frontend approach augmented with a convolutional neural network classifier demonstrated substantial improvement over the VAE backend use case.

Read full abstract

Constant Q Cepstral Coefficients Research Articles

Related Topics

Articles published on Constant Q Cepstral Coefficients

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Discriminative features based on modified log magnitude spectrum for playback speech detection

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Device Feature Extraction Based on Parallel Neural Network Training for Replay Spoofing Detection

Extraction of Octave Spectra Information for Spoofing Attack Detection

Long-term high frequency features for synthetic speech detection

Improving anti-spoofing with octave spectrum and short-term spectral statistics information

Development of Machine Learning for Asthmatic and Healthy Voluntary Cough Sounds: A Proof of Concept Study

A Comparison of Features for Replay Attack Detection

Low frequency frame-wise normalization over constant-Q transform for playback speech detection

Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features.

Data selection for i-vector based automatic speaker verification anti-spoofing

Synthetic speech detection using fundamental frequency variation and spectral features

Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Constant Q Cepstral Coefficients Research Articles

Related Topics

Articles published on Constant Q Cepstral Coefficients

Combination of Amplitude and Frequency Modulation Features for Presentation Attack Detection

Discriminative features based on modified log magnitude spectrum for playback speech detection

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Device Feature Extraction Based on Parallel Neural Network Training for Replay Spoofing Detection

Extraction of Octave Spectra Information for Spoofing Attack Detection

Long-term high frequency features for synthetic speech detection

Improving anti-spoofing with octave spectrum and short-term spectral statistics information

Development of Machine Learning for Asthmatic and Healthy Voluntary Cough Sounds: A Proof of Concept Study

A Comparison of Features for Replay Attack Detection

Low frequency frame-wise normalization over constant-Q transform for playback speech detection

Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features.

Data selection for i-vector based automatic speaker verification anti-spoofing

Synthetic speech detection using fundamental frequency variation and spectral features

Constant Q cepstral coefficients: A spoofing countermeasure for automatic speaker verification