The susceptibility of Deep Neural Networks (DNNs) to adversarial attacks in Automatic Speech Recognition (ASR) systems has drawn significant attention. Most work focuses on white-box methods, but the assumption of full transparency of model architecture and parameters is unrealistic in real-world scenarios. Although several targeted black-box attack methods have been proposed in recent years, due to the complexity of ASR systems, they primarily rely on query-based approaches with limited search capabilities, leading to low success rates and noticeable noise. To address this, we propose DE-gradient, a new black-box approach using differential evolution (DE), a population-based search algorithm. Inspired by Semantic Web ideas, we introduce modulation noise to preserve semantic coherence while enhancing imperceptibility. In experiments on two public datasets, DE-gradient improved attack success rates by 19% and increased the signal-to-noise ratio (SNR) of silent parts from 27 dB to 54 dB, establishing a strong baseline for evaluating black-box adversarial attacks in ASR systems.
Read full abstract