Abstract
With the increasing use of voice as a biometric, it has become imperative to develop countermeasures to thwart malicious spoofing attacks on speaker recognition systems. Even though there has been a significant research effort over the last few years dedicated to the development of countermeasures to detect and deflect spoofing attacks, the problem is far from being solved. While a deep learning technique has been successfully applied in anti-spoofing research, it suffers from a data scarcity issue where large amounts of labeled training data are required to build a robust model. In this paper, we investigate a domain adaptation approach of deep architectures in both a supervised setting where we use labeled data, and in an unsupervised setting where we assume unlabeled data when transferring knowledge from the source to target domain. Specifically, we employ convolutional neural networks (CNNs) as back-end classifiers for spoofed speech detection. For supervised domain adaptation, we propose joint neural network training while allowing the weights to be shared between the source and target streams, and an additional domain regularizer. In the unsupervised domain adaptation scenario, the weights are not shared in order to explicitly model the domain shift. However, the weights are related by weight regularizers to take into account the difference between the two domains. We conduct extensive cross-database (domain mismatch) experiments using ASVspoof 2015 and BTAS 2016 datasets to demonstrate the generalization capability of the proposed deep domain architectures for spoofing detection. Experimental results reveal that the proposed architectures can generalize across databases for both supervised and unsupervised adaptation scenarios.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have