Abstract Speaker verification, like other biometric technologies, is vulnerable to spoofing attacks. An attacker impersonates a specific target speaker using impersonation, replay, Text-to-Speech (TTS), or Voice conversion (VC) techniques to gain unauthorized access to the system. The work in this paper, proposes a solution that uses an amalgamation of Cochleagram and Residual Network (ResNet) to implement the frontend feature extraction phase of an Audio Spoof Detection (ASD) system. Cochleagram generation, feature extraction-dimensionality reduction and classification are the three main phases of the proposed ASD system. In the first phase, the recorded audios have been converted into Cochleagrams by using Equivalent Rectangular Bandwidth (ERB) based gammatone filters. In the next phase, three variants of Residual Network (ResNet), ResNet50, ResNet41 and ResNet27, one by one, have been used for extracting dynamic features and Resnet50, ResNet41 that are fed to Linear Decrement Analysis (LDA) for dimensionality reduction. At last, in the classification phase, the LDA reduced features have been used for training four different machine learning classifiers such as Random Forest, Naïve Bayes, K-Nearest Neighbour (KNN), and eXtreme Gradient Boosting (XGBoost), individually. The proposed work in this paper concentrates on synthetic, replay, and deepfake attacks. The state-of-the-art ASVspoof 2019 Logical Access (LA), Physical Access (PA), Voice Spoofing Detection Corpus (VSDC) and DEepfake CROss-lingual (DECRO) datasets are utilised for training and testing the proposed ASD system. Additionally, we have assessed the performance of our proposed system under the influence of additive noise. Airplane noise at different SNR rate (0, dB 5dB, 10dB and -5dB) has been added to training and testing audios for the same. From the obtained results, it can be concluded that combination of Cochleagram and ResNet50 with XGBoost classifier outperforms all other implemented systems for detecting fake audios under noisy environment.
Read full abstract