Abstract
DNN-based speaker recognition systems (SRSs) in smart cities suffer from adversarial attacks, which have caused widespread concern. An attacker can fool the SRSs by adding imperceptible perturbations to benign audio. Recent studies have shown that adversarial attacks could achieve almost 100% attack success rate in the white-box but perform poorly in the black-box. Existing attacks do not effectively use the gradient information of the available white-box models, which is easy to over-fit the target model. To tackle the problem, we propose a temporal and spatial momentum-based iteration gradient sign method (TSMI-FGSM). Specifically, we introduce the sample neighborhood and interior space, and accumulate the gradient information of the randomly sampled points in these two spaces to correct and update direction by tuning strategies during each iteration. The experiment results with 9 SRSs demonstrate that our method significantly enhances the transferability of the adversarial examples compared to state-of-the-art attacks.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have