Abstract

With the increasing popularity of automatic speaker verification (ASV), the reliability of ASV systems has also gained importance. ASV is vulnerable to various spoofing attacks, especially replay attacks. Thus, recent public competitions and studies based on spoofing attack detection for ASV have mainly focused on the detection of replay attacks. Generally, replayed speech includes the attributes of one playback and two recording devices: the playback device, the recording device used by the attacker, and the recording device embedded in any system to verify input utterances. Therefore, the main attributes differentiating a replayed speech from the genuine speech are the attributes of the playback and the recording devices used by the attacker. In this paper, we propose a novel replay attack and its defense through observation of the general speech-spoofing process. The proposed attack includes only the attribute of one recording device embedded in an ASV system; genuine speech passes through the recording device only once, and the replayed speech produced for the proposed attack passes through the same recording device twice. Because the proposed attack is feasible, it can be considered a new task for replay countermeasures in the training process in order to develop a robust ASV protection system. The experimental results show that this novel replay attack cannot be detected by several of the existing state-of-the-art replay attack detection systems. Furthermore, the new attack can be detected by the same systems successfully if they are retrained with an appropriate dataset designed for the new task.

Highlights

  • Automatic speaker verification (ASV) is a technique that verifies a user’s identity by analyzing his/her speech

  • In order to ensure the reliability of our results, we first evaluated the performance of our systems under trials of ASVspoof 2017 version 2 and ASVspoof 2019 PA, which have been widely used in the studies on replay attack detection

  • The difference between genuine and replayed speeches has been investigated and observed; genuine speech passes the recording device that is embedded in a system only once, whereas replayed speech passes the same recording device twice

Read more

Summary

Introduction

Automatic speaker verification (ASV) is a technique that verifies a user’s identity by analyzing his/her speech. Because it uses only speech, it is relatively convenient, compared to other verification techniques. It has been widely used in many smart devices that require user verification, such as smart speakers and smartphones. Competitions involving spoofing and ASV countermeasures have been held steadily, and related studies have been conducted [1]–[6].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call