Abstract

Nowadays, fingerprint and retina scans are the most reliable and widely used biometric authentication systems. Another emerging biometric approach for authentication, though vulnerable, is speech-based systems. However, speech-based systems are more prone to spoofing attacks. Through such malicious attacks, an unauthentic person tries to present himself as legitimate user, in order to acquire illegitimate advantage. Therefore, these attacks, created by synthesis or conversion of speech, pose an enormous threat to the reliable functioning of automatic speaker verification (ASV) authentication systems. The work presented in this paper tries to address this problem by using deep learning (DL) methods and ensemble of different neural networks. The first model is a combination of time-distributed dense layers and long short-term memory (LSTM) layers. The other two deep neural networks (DNNs) are based on temporal convolution (TC) and spatial convolution (SC). Finally, an ensemble model comprising of these three DNNs has also been analysed. All these models are analysed with Mel frequency cepstral coefficients (MFCC), inverse Mel frequency cepstral coefficients (IMFCC) and constant Q cepstral coefficients (CQCC) at the frontend, where the proposed ensemble performs best with CQCC features. The proposed work uses ASVspoof 2015 and ASVspoof 2019 datasets for training and testing, with the evaluation set having speech synthesis (SS) and voice conversion (VC) attacked utterances. Performance of proposed system trained with ASVspoof 2015 dataset degrades with evaluation set of ASVspoof 2019 dataset, whereas performance of the same system improves when training is also done with the ASVspoof 2019 dataset. Also, a joint ASVspoof 1519 dataset is created to add more variations into the single dataset, and it has been observed that the trained ensemble with this joint dataset performs even better, while performing evaluation using single datasets. This research promotes the development of systems that can cope with completely unknown data in testing. It has been further observed that the models can reach promising results, paving the way for future research in this domain using DL.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.