Abstract

Automatic speaker verification (ASV) systems have been widely applied in voice user interfaces to conduct person identification and access control via voiceprints. A typical ASV system consists of three stages, i.e., training, enrollment, and verification. Previous work has revealed that the ASV system can be bypassed at the training stage by backdoor attacks and at the verification stage by adversarial example attacks. In this paper, we propose a new type of backdoor attack aimed at the enrollment stage via adversarial ultrasound, named UltraBD, which is highly imperceptible, synchronization-free, and content-independent. By simultaneously injecting the ultrasound backdoor examples when the legitimate user initiates the enrollment, the polluted voiceprints stored in the ASV systems grant access to both the legitimate user and the adversary with relatively high confidence. Despite the challenges, i.e., when, what, and how the legitimate user articulates at the enrollment stage can be remarkably unpredictable and various, we managed to launch UltraBD by augmenting the generation and optimization process of the ultrasound backdoor examples with the randomness of synchronous time and relative amplitude ratio. Furthermore, we optimize the modulation mechanism of adversarial ultrasound by tuning the baseband signal on limited signal frequency points to improve its robustness in the physical world setting. We validate UltraBD on two common datasets together with two open-source ASV models. Results show that UltraBD can be robust to various configurations, e.g., different speakers and utterance content. In sum, our attack calls attention to a new attack surface of ASV systems and sheds light on its fundamental mechanisms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call