Abstract
With the increasing popularity of automatic speaker verification (ASV), the reliability of ASV systems has also gained importance. ASV is vulnerable to various spoofing attacks, especially replay attacks. Thus, recent public competitions and studies based on spoofing attack detection for ASV have mainly focused on the detection of replay attacks. Generally, replayed speech includes the attributes of one playback and two recording devices: the playback device, the recording device used by the attacker, and the recording device embedded in any system to verify input utterances. Therefore, the main attributes differentiating a replayed speech from the genuine speech are the attributes of the playback and the recording devices used by the attacker. In this paper, we propose a novel replay attack and its defense through observation of the general speech-spoofing process. The proposed attack includes only the attribute of one recording device embedded in an ASV system; genuine speech passes through the recording device only once, and the replayed speech produced for the proposed attack passes through the same recording device twice. Because the proposed attack is feasible, it can be considered a new task for replay countermeasures in the training process in order to develop a robust ASV protection system. The experimental results show that this novel replay attack cannot be detected by several of the existing state-of-the-art replay attack detection systems. Furthermore, the new attack can be detected by the same systems successfully if they are retrained with an appropriate dataset designed for the new task.
Highlights
Automatic speaker verification (ASV) is a technique that verifies a user’s identity by analyzing his/her speech
In order to ensure the reliability of our results, we first evaluated the performance of our systems under trials of ASVspoof 2017 version 2 and ASVspoof 2019 PA, which have been widely used in the studies on replay attack detection
The difference between genuine and replayed speeches has been investigated and observed; genuine speech passes the recording device that is embedded in a system only once, whereas replayed speech passes the same recording device twice
Summary
Automatic speaker verification (ASV) is a technique that verifies a user’s identity by analyzing his/her speech. Because it uses only speech, it is relatively convenient, compared to other verification techniques. It has been widely used in many smart devices that require user verification, such as smart speakers and smartphones. Competitions involving spoofing and ASV countermeasures have been held steadily, and related studies have been conducted [1]–[6].
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.