Abstract

Benefiting from the development of big data, edge computing, and deep learning, splendid breakthroughs have been made in automatic speech recognition (ASR) in recent years. Since then, more and more smart products have chosen speech as the interface for human-computer interaction, which causes popularity of edge intelligence (EI) enhanced automatic speech recognition. While people are enjoying the social changes brought by speech recognition technology, a factor of instability quietly emerged called audio adversarial example which is a type of audio deliberately generated by attackers via adding subtle perturbations to the original audio signal. The added perturbations which sound like certain noise that cannot be precepted by human but will cause ASR system make wrong transcription. Three detection algorithms for audio adversarial examples are proposed in this thesis, namely, the robust detection algorithm based on WER (word error rate), the feature detection algorithm based on ADR (adversarial ratio), and the collaborative detection algorithm based on neural network. The experiment results show that three detection algorithms proposed in this thesis have a great discrimination on audio adversarial examples and achieve high AUC scores. Among them, the cooperative detection is the best and the feature detection is the worst. In addition, we found that robust detection algorithm tends to have a higher accuracy score but a lower recall score, while feature detection algorithm tends to have the converse performance. Moreover, since the proposed collaborative detection method combines the advantages of the robust detection and feature detection methods, it presents a better performance with respect to accuracy, recall, and F1 score.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call