Abstract
Automatic speaker verification (ASV) is an emerging biometric verification technique with more and more applications. However, both verification accuracy and anti-spoofing should be considered carefully before putting ASV into practice, where anti-spoofing is also called replay detection in which voice is recorded, stored and replayed to deceive ASV systems. Cascaded decision of anti-spoofing and ASV is a straightforward solution to tackle the two issues. In this paper, joint decision of anti-spoofing and ASV was investigated in a multi-task learning framework with contrastive loss in order to improve the cascaded decision approach. A modified triplet loss was firstly constructed to supervise deep neural networks to extract embedding vectors containing information of both speaker identity and spoofing. The embedding vectors were subsequently taken as input features by back-end classifiers towards speaker and spoofing classification. The experimental results on both ASVspoof 2017 and ASVspoof 2019 showed that the proposed joint decision approach with triplet loss outperformed the corresponding baselines, a recent work on joint decision with Gaussian back-end fusion and our previous joint decision approach with cross-entropy loss.
Highlights
With the development of engineering applications of artificial intelligence, biometric authentication is becoming popular in scenario of protecting the security of computers, smart devices, and networks, such as fingerprint and face recognition
TABLE 6 presents the Equal error rate (EER) results obtained from different features with convolutional neural network (CNN), deep neural networks (DNN) or time-delay deep neural network (TDNN) networks based on cross-entropy or triplet loss on ASVspoof 2017
For anti-spoofing, our system achieved an EER of 11.89% by using mel-frequency cepstral coefficients (MFCC) and TDNN with triplet loss compared to 24.35% in [17]
Summary
With the development of engineering applications of artificial intelligence, biometric authentication is becoming popular in scenario of protecting the security of computers, smart devices, and networks, such as fingerprint and face recognition. Automatic speaker verification (ASV) is a conventional way to put voiceprint into practical usage, where it verifies the claimed identity of a speaker by recording voices, extracting voiceprints and computing similarities. Recognition system by showing a photo of an authenticated user to the camera, or attacking an ASV system by playing back a recording of a verified user [3], [4]. Since replay attacks are easy to implement and highly similar to bona fide speech, it is difficult to detect and bring serious threats to ASV systems [5]. Anti-spoofing should be considered carefully before putting ASV into practical usage
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.