Abstract

Human-machine interaction is increasingly dependent on speech communication, mainly due to the remarkable performance of Machine Learning models in speech recognition tasks. However, these models can be fooled by adversarial examples, which are inputs intentionally perturbed to produce a wrong prediction without the changes being noticeable to humans. While much research has focused on developing new techniques to generate adversarial perturbations, less attention has been given to aspects that determine whether and how the perturbations are noticed by humans. This question is relevant since high fooling rates of proposed adversarial perturbation strategies are only valuable if the perturbations are not detectable. In this paper we investigate to which extent the distortion metrics proposed in the literature for audio adversarial examples, and which are commonly applied to evaluate the effectiveness of methods for generating these attacks, are a reliable measure of the human perception of the perturbations. Using an analytical framework, and an experiment in which 36 subjects evaluate audio adversarial examples according to different factors, we demonstrate that the metrics employed by convention are not a reliable measure of the perceptual similarity of adversarial examples in the audio domain.

Highlights

  • Human-computer interaction increasingly relies on Machine Learning (ML) models such as Deep Neural Networks (DNNs) trained from, usually large, datasets (Fang et al, 2018; Gao et al, 2019; Hassan et al, 2018; Nunez et al, 2018)

  • We provide evidence that standard distortion metrics employed in previous works are not a reliable measure of the perceptual distortion of audio adversarial examples in this domain, showing that more specific metrics are required in order to achieve more realistic results

  • In this paper we have addressed the measurement of the perceptual distortion of audio adversarial examples, which remains a challenging task despite being a fundamental condition for effective adversarial attacks

Read more

Summary

Introduction

Human-computer interaction increasingly relies on Machine Learning (ML) models such as Deep Neural Networks (DNNs) trained from, usually large, datasets (Fang et al, 2018; Gao et al, 2019; Hassan et al, 2018; Nunez et al, 2018). It has been demonstrated that such models can be fooled by perturbing an input sample with malicious and quasiimperceptible perturbations These attacks are known in the literature as adversarial examples (Goodfellow et al, 2014; Szegedy et al, 2014). The study of adversarial examples has focused primarily on image domain and computer vision tasks (Akhtar and Mian, 2018), whereas domains such as text or audio have received much less attention. Such domains imply additional challenges and difficulties. One of the evident differences between domains is the way in which the information is represented, and, the way in which adversarial perturbations are measured, bounded and perceived by human subjects

Objectives
Methods
Findings
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.