Nickel to Lego

Joon Kuy Han,Simon S Woo,Hyoungshick Kim

doi:10.1145/3319535.3363264

Abstract

Many companies offer automatic speech recognition or Speech-to-Text APIs for use in diverse applications. However, audio classification algorithms trained with deep neural networks (DNNs) can sometimes misclassify adversarial examples, posing a significant threat to critical applications. In this paper, we present a novel way to create adversarial audio examples using a genetic algorithm. Our algorithm creates adversarial examples by iteratively adding perturbations to the original audio signal. Unlike most white-box adversarial example generations, our approach does not require knowledge about the target DNN's model and parameters (black-box) and heavy computational power of GPU resources. To show the feasibility of the proposed idea, we implement a tool called, Foolgle, using a genetic algorithm that performs untargeted attacks to create adversarial audio examples and evaluate those with the state-of-the-art Google Cloud Speech-to-Text API. Our preliminary experiment results show that Foolgle deceives the API with a success probability of 86%.

Full Text