A Kernel Density Estimation Based Loss Function and its Application to ASV-Spoofing Detection

Alejandro Gomez-Alanis,Antonio M Peinado,Jose A Gonzalez-Lopez

doi:10.1109/access.2020.3000641

Alejandro Gomez-Alanis, Antonio M Peinado + Show 1 more

Open Access

https://doi.org/10.1109/access.2020.3000641

Copy DOI

Abstract

Biometric systems are exposed to spoofing attacks which may compromise their security, and voice biometrics, also known as automatic speaker verification (ASV), is no exception. Replay, synthesis and voice conversion attacks cause false acceptances that can be detected by anti-spoofing systems. Recently, deep neural networks (DNNs) which extract embedding vectors have shown superior performance than conventional systems in both ASV and anti-spoofing tasks. In this work, we develop a new concept of loss function for training DNNs which is based on kernel density estimation (KDE) techniques. The proposed loss functions estimate the probability density function (pdf) of every training class in each mini-batch, and compute a log likelihood matrix between the embedding vectors and pdfs of all training classes within the mini-batch in order to obtain the KDE-based loss. To evaluate our proposal for spoofing detection, experiments were carried out on the recent ASVspoof 2019 corpus, including both logical and physical access scenarios. The experimental results show that training a DNN based anti-spoofing system with our proposed loss functions clearly outperforms the performance of the same system being trained with other well-known loss functions. Moreover, the results also show that the proposed loss functions are effective for different types of neural network architectures.

Highlights

B IOMETRIC authentication [1] aims to authenticate the identity claimed by a given individual based on samples measured from biological processes and/or organs
It is not straightforward to select the most appropiate distance measure as well as the embedding normalization technique. These loss functions typically require the usage of an extra hyper-parameter called margin which is difficult to optimize. To address all these issues, we propose a new probabilistic loss function for supervised metric learning, where every training class is represented with a probability density function which is estimated through kernel density estimation (KDE) [32]–[34] in each mini-batch
We used the Light Convolutional Gated Recurrent Neural Network (LC-GRNN) that we proposed in our previous works [9], [19]

Summary

Introduction

B IOMETRIC authentication [1] aims to authenticate the identity claimed by a given individual based on samples measured from biological processes and/or organs (e.g., voice, fingerprint, face, etc). ASV techniques verify the claimed identity of a given speaker by recording her/his voice, extracting voiceprints from the voice recordings, and deciding whether the speaker is who s/he claims to be based on the extracted voiceprints and a set of pre-stored voiceprints from enrolled users. Our focus in this work is on spoofing detection for ASV, where an impostor could gain fraudulent bypass to the authentication system by presenting speech resembling the voice of a genuine user. Four types of spoofing attacks have been identified [4]: (i) impersonation (i.e., mimicking the voice of a target speaker), (ii) replay (i.e., using pre-recorded voice of a target user), and, either (iii) text-to-speech synthesis (TTS) or (iv) voice conversion (VC) systems to generate artificial speech resembling the voice of a legitimate user

Methods

Results

Conclusion