Abstract
In this paper, a new feature set is proposed for use in a playback attack detector (PAD) aimed at safeguarding a passphrase and speaker-verified protected system that can be remotely accessed from an arbitrary location using an arbitrary telecommunication channel. The new feature set, termed VoicedTracks, is a time-frequency map of the most robust harmonic trajectories in an utterance and serves as an audio fingerprint that can uniquely identify an utterance despite a moderate amount of noise and channel distortion. Experimental results are obtained using a specially designed in-house database; the impact of various noise types and SNR levels is further investigated using a publicly available database. An analysis of playback scores across several combinations of telecommunication channel types, playback devices and additive noise demonstrates robustness of the feature set to channel distortion and additive noise, thus making it suitable for use in a copy-detection based PAD (cd-PAD) designed for applications such as telephone banking. Relative to other cd-PADs the proposed approach was better able to defend against playback attacks when telephone channels were involved. An analysis of its performance across the replay configurations used in the ASVspoof 2017 V2 evaluation set suggests that the proposed cd-PAD is highly effective in detecting those playback attacks that are most likely to spoof the speaker verification system.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have