Abstract

In the last few years, we have witnessed an exponential growth in voice spoofing attacks. The intruders employ different types of attacks such as speech synthesis where they use the machine generated speech against any target person to fool the automatic speaker verification (ASV) systems for various tasks i.e. home control, bank account access, etc. The availability of modern-day advanced tools has made it convenient to launch such types of voice spoofing attacks. To overcome the challenges associated with bypassing the security of ASV systems using the synthetic speech, we propose an effective synthetic speech detector using a fusion of spectral features. More specifically, we propose a fused feature vector consisting of MFCC, GTCC, Spectral Flux, and Spectral Centroid for audio signal representation. This fused feature set is capable of capturing the traits of speech variation attributes of genuine signal and algorithmic artifacts of synthetic signals. These features are further used to train the bilstm to classify the signal as genuine or spoof. The proposed framework is capable of detecting both the voice conversion and synthetic speech attacks on ASV systems. Performance of our framework is evaluated on ASVspoof 2019 LA dataset. Our experimental results illustrate the effectiveness of the proposed framework for logical access attacks (voice conversion and cloned/synthetic voice) detection.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call