Abstract

As automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, they are typically used in conjunction with spoofing countermeasure (CM) systems to improve security. For example, the CM can first determine whether the input is human speech, then the ASV can determine whether this speech matches the speaker's identity. The performance of such a tandem system can be measured with a tandem detection cost function (t-DCF). However, ASV and CM systems are usually trained separately, using different metrics and data, which does not optimize their combined performance. In this work, we propose to optimize the tandem system directly by creating a differentiable version of t-DCF and employing techniques from reinforcement learning. The results indicate that these approaches offer better outcomes than finetuning, with our method providing a 20% relative improvement in the t-DCF in the ASVSpoof19 dataset in a constrained setting.

Highlights

  • An automatic speaker verification (ASV) system attempts to verify if a given speech utterance matches the claimed identity [1]

  • Spoofing countermeasure (CM) systems aim to detect these crafted audio samples, and improve security when combined with an ASV system [3]

  • Given its success with other metrics, we extend the idea of soft detection cost function (DCF) to tandem detection cost function (t-DCF) to assess its applicability in tandem optimization

Read more

Summary

Introduction

An automatic speaker verification (ASV) system attempts to verify if a given speech utterance matches the claimed identity [1]. Spoofing countermeasure (CM) systems aim to detect these crafted audio samples, and improve security when combined with an ASV system [3]. This improvement is achieved by separately training the two systems, using them in conjunction with each other and evaluating their performance using a tandem detection cost function (t-DCF) [4]. They are evaluated using this tandem metric, the original ASV and CM systems are not trained to minimize the t-DCF. Some attack systems used to generate spoof samples could fool the CM but may be detected by the ASV system, as is the case with system A17 in the ASVspoof dataset [5]

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call