Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Bhusan Chettri,Tomi Kinnunen,Emmanouil Benetos

doi:10.1016/j.csl.2020.101092

Abstract

Automatic speaker verification (ASV) systems are highly vulnerable to presentation attacks, also called spoofing attacks. Replay is among the simplest attacks to mount — yet difficult to detect reliably. The generalization failure of spoofing countermeasures (CMs) has driven the community to study various alternative deep learning CMs. The majority of them are supervised approaches that learn a human-spoof discriminator. In this paper, we advocate a different, deep generative approach that leverages from powerful unsupervised manifold learning in classification. The potential benefits include the possibility to sample new data, and to obtain insights to the latent features of genuine and spoofed speech. To this end, we propose to use variational autoencoders (VAEs) as an alternative backend for replay attack detection, via three alternative models that differ in their class-conditioning. The first one, similar to the use of Gaussian mixture models (GMMs) in spoof detection, is to train independently two VAEs — one for each class. The second one is to train a single conditional model (C-VAE) by injecting a one-hot class label vector to the encoder and decoder networks. Our final proposal integrates an auxiliary classifier to guide the learning of the latent space. Our experimental results using constant-Q cepstral coefficient (CQCC) features on the ASVspoof 2017 and 2019 physical access subtask datasets indicate that the C-VAE offers substantial improvement in comparison to training two separate VAEs for each class. On the 2019 dataset, the C-VAE outperforms the VAE and the baseline GMM by an absolute 9 - 10% in both equal error rate (EER) and tandem detection cost function (t-DCF) metrics. Finally, we propose VAE residuals — the absolute difference of the original input and the reconstruction as features for spoofing detection. The proposed frontend approach augmented with a convolutional neural network classifier demonstrated substantial improvement over the VAE backend use case.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Abstract

Talk to us

Similar Papers

More From: Computer Speech & Language

Lead the way for us

Journal: Computer Speech & Language	Publication Date: Mar 19, 2020
Citations: 12

Similar Papers

Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features.
Hong Yu ... Zheng-Hua Tan
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29
Hong Yu, et. al.Hong Yu ... Zheng-Hua Tan
04 Dec 2017
IEEE Transactions on Neural Networks and Learning Systems | VOL. 29

Compressed high dimensional features for speaker spoofing detection
Yuanjun Zhao ... Victor Sreeram
-
Yuanjun Zhao, et. al.Yuanjun Zhao ... Victor Sreeram
01 Dec 2017
01 Dec 2017

Static\u2013dynamic features and hybrid deep learning models based spoof detection system for ASV
Aakshi Mittal ... Mohit Dua
Complex & Intelligent Systems | VOL. 8
Aakshi Mittal, et. al.Aakshi Mittal ... Mohit Dua
19 Nov 2021
Static\u2013dynamic features and hybrid deep learning models based spoof detection system for ASV
Aakshi Mittal ... Mohit Dua

Data selection for i-vector based automatic speaker verification anti-spoofing
Cemal Hanilçi
Digital Signal Processing | VOL. 72
Cemal HanilçiCemal Hanilçi
24 Oct 2017
Digital Signal Processing | VOL. 72

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Deep generative variational autoencoding for replay spoof detection in automatic speaker verification

Abstract

Talk to us

Similar Papers

More From: Computer Speech &amp; Language

More From: Computer Speech & Language