Abstract

The growing number of voice-controlled devices (VCDs), i.e. Google Home, Amazon Alexa, etc., has resulted in automation of home appliances, smart gadgets, and next generation vehicles, etc. However, VCDs and voice-activated services i.e. chatbots are vulnerable to audio replay attacks. Our vulnerability analysis of VCDs shows that these replays could be exploited in multi-hop scenarios to maliciously access the devices/nodes attached to the Internet of Things. To protect these VCDs and voice-activated services, there is an urgent need to develop reliable and computationally efficient solutions to detect the replay attacks. This paper models replay attacks as a nonlinear process that introduces higher-order harmonic distortions. To detect these harmonic distortions, we propose the acoustic ternary patterns-gammatone cepstral coefficient (ATP-GTCC) features that are capable of capturing distortions due to replay attacks. Error correcting output codes model is used to train a multi-class SVM classifier using the proposed ATP-GTCC feature space and tested for voice replay attack detection. Performance of the proposed framework is evaluated on ASVspoof 2019 dataset, and our own created voice spoofing detection corpus (VSDC) consisting of bona-fide, first-order replay (replayed once), and second-order replay (replayed twice) audio recordings. Experimental results signify that the proposed audio replay detection framework reliably detects both first and second-order replay attacks and can be used in resource constrained devices.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call