Abstract

A speaker verification system automatically accepts or rejects a claimed identity of a speaker based on a speech sample. Recently, a major progress was made in speaker verification which leads to mass market adoption, such as in smartphone and in online commerce for user authentication. A major concern when deploying speaker verification technology is whether a system is robust against spoofing attacks. Speaker verification studies provided us a good insight into speaker characterization, which has contributed to the progress of voice conversion technology. Unfortunately, voice conversion has become one of the most easily accessible techniques to carry out spoofing attacks; therefore, presents a threat to speaker verification systems. In this paper, we will briefly introduce the fundamentals of voice conversion and speaker verification technologies. We then give an overview of recent spoofing attack studies under different conditions with a focus on voice conversion spoofing attack. We will also discuss anti-spoofing attack measures for speaker verification.

Highlights

  • A large number of physical or behavioral attributes, which are distinctive, measurable characteristics to describe human individuals, have been investigated for biometric recognition

  • Is easy to understand that the majority of the past studies use both equal error rate (EER) and false acceptance rate (FAR) as evaluation metrics to measure the vulnerability of speaker verification system against voice conversion spoofing

  • We present an overview of voice conversion spoofing and anti-spoofing for speaker verification

Read more

Summary

INTRODUCTION

A large number of physical or behavioral attributes, which are distinctive, measurable characteristics to describe human individuals, have been investigated for biometric recognition. If the replayed content is different from the specific pass-phrase required by a text-dependent speaker verification (TD-SV) system, it does not pose a threat unless the attack is able to acquire the target speaker’s voice for that specific pass-phrase as assumed in [5]. Owing to the rapid development of unit selection [9], statistical parametric [10], and hybrid [11] methods, speech synthesis systems are able to generate speech with acceptable quality as well as voice characteristics of a given target speaker, such as spectral cues. Different from replay attack, human voice mimicking and speech synthesis, voice conversion transforms one speaker’s (source) voice to sound like that of another speaker (target) without changing the language content.

VOICE CONVERSION TECHNIQUES
SPEAKER VERIFICATION TECHNIQUES
SPOOFING ATTACK STUDIES
ANTI-SPOOFING ATTACK STUDIES
SOFTWARES AND DATABASES
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call