Toward robust replay attack detection in Automatic Speaker Verification: A study of spectrum estimation and channel magnitude response modeling

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Toward robust replay attack detection in Automatic Speaker Verification: A study of spectrum estimation and channel magnitude response modeling

Similar Papers
  • Conference Article
  • Cite Count Icon 4
  • 10.1109/fit53504.2021.00036
Automatic Speaker Verification and Replay Attack Detection System using novel Glottal Flow Cepstrum Coefficients
  • Dec 1, 2021
  • Yusra Banaras + 2 more

With the growing number of devices that use voice biometric system for verification purposes, it becomes mandatory to accurately verify the target speaker. Moreover, intruders use different spoofing attacks to fool an automatic speaker verification (ASV) system. Voice replay attack being the easiest to generate among other spoofing attacks is frequently employed in front of the ASV systems of these devices to get access to the system for malicious purposes. Thus, we need to develop robust systems that are capable of classifying the target speaker and differentiate between the bonafide and replay voice. To combat the vulnerabilities of ASV systems, we proposed an automatic speaker verification and replay attack detection (ASVRAD) system. For this purpose, we proposed a novel audio feature i.e., Glottal Flow Gammatone Cepstrum Coefficients (GLGTCC), an enhanced version of Glottal Flow Cepstrum Coefficients (GLFCC) for accurate classification of the bonafide speakers and voice replay attacks detection. Our GLGTCC features can reliably capture the speaker-specific attributes from their vocal excitation patterns in the audio. Later, we employed a multiclass support vector machine (SVM) to classify the bonafide speakers, and a binary SVM for classification of bonafide and replay audios. Experimental results of our system on the standard ASVspoof 2019 PA dataset achieved the lowest EER of 3.00% for speaker verification and 18.38% on voice replay attacks over the comparative methods.

  • Research Article
  • Cite Count Icon 13
  • 10.1016/j.engappai.2022.105469
Waveform level adversarial example generation for joint attacks against both automatic speaker verification and spoofing countermeasures
  • Oct 11, 2022
  • Engineering Applications of Artificial Intelligence
  • Xingyu Zhang + 5 more

Waveform level adversarial example generation for joint attacks against both automatic speaker verification and spoofing countermeasures

  • Research Article
  • Cite Count Icon 15
  • 10.1016/j.specom.2019.09.004
Privacy-preserving PLDA speaker verification using outsourced secure computation
  • Oct 1, 2019
  • Speech Communication
  • Amos Treiber + 4 more

Privacy-preserving PLDA speaker verification using outsourced secure computation

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 24
  • 10.1109/access.2020.2974290
A New Replay Attack Against Automatic Speaker Verification Systems
  • Jan 1, 2020
  • IEEE Access
  • Sung-Hyun Yoon + 3 more

With the increasing popularity of automatic speaker verification (ASV), the reliability of ASV systems has also gained importance. ASV is vulnerable to various spoofing attacks, especially replay attacks. Thus, recent public competitions and studies based on spoofing attack detection for ASV have mainly focused on the detection of replay attacks. Generally, replayed speech includes the attributes of one playback and two recording devices: the playback device, the recording device used by the attacker, and the recording device embedded in any system to verify input utterances. Therefore, the main attributes differentiating a replayed speech from the genuine speech are the attributes of the playback and the recording devices used by the attacker. In this paper, we propose a novel replay attack and its defense through observation of the general speech-spoofing process. The proposed attack includes only the attribute of one recording device embedded in an ASV system; genuine speech passes through the recording device only once, and the replayed speech produced for the proposed attack passes through the same recording device twice. Because the proposed attack is feasible, it can be considered a new task for replay countermeasures in the training process in order to develop a robust ASV protection system. The experimental results show that this novel replay attack cannot be detected by several of the existing state-of-the-art replay attack detection systems. Furthermore, the new attack can be detected by the same systems successfully if they are retrained with an appropriate dataset designed for the new task.

  • Research Article
  • Cite Count Icon 17
  • 10.1007/s10772-011-9120-6
Speaker verification under degraded condition: a perceptual study
  • Oct 13, 2011
  • International Journal of Speech Technology
  • Gayadhar Pradhan + 1 more

This study analyzes the effect of degradation on human and automatic speaker verification (SV) tasks. The perceptual test is conducted by the subjects having knowledge about speaker verification. An automatic SV system is developed using the Mel-frequency cepstral coefficients (MFCC) and Gaussian mixture model (GMM). The human and automatic speaker verification performances are compared for clean train and different degraded test conditions. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. The perceptual cues that the human subjects used as speaker specific information are investigated and their importance in degraded condition is highlighted. The difference in the nature of human and automatic SV tasks is investigated in terms of falsely accepted and falsely rejected speech pairs. Speech signals are reconstructed in clean and degraded conditions by highlighting different speaker specific information and compared through perceptual test. A discussion on human vs automatic speaker verification is carried out and the possibility of performance improvement of automatic speaker verification under degraded condition is suggested.

  • Conference Article
  • Cite Count Icon 43
  • 10.1109/infocom41043.2020.9155483
Voiceprint Mimicry Attack Towards Speaker Verification System in Smart Home
  • Jul 1, 2020
  • Lei Zhang + 5 more

The advancement of voice controllable systems (VC-Ses) has dramatically affected our daily lifestyle and catalyzed the smart home’s deployment. Currently, most VCSes exploit automatic speaker verification (ASV) to prevent various voice attacks (e.g., replay attack). In this study, we present VMask, a novel and practical voiceprint mimicry attack that could fool ASV in smart home and inject the malicious voice command disguised as a legitimate user. The key observation behind VMask is that the deep learning models utilized by ASV are vulnerable to the subtle perturbations in the voice input space. To generate these subtle perturbations, VMask leverages the idea of adversarial examples. Then by adding the subtle perturbations to the recordings from an arbitrary speaker, VMask can mislead the ASV into classifying the crafted speech samples, which mirror the former speaker for human, as the targeted victim. Moreover, psychoacoustic masking is employed to manipulate the adversarial perturbations under human perception threshold, thus making victim unaware of ongoing attacks. We validate the effectiveness of VMask by performing comprehensive experiments on both grey box (VGGVox) and black box (Microsoft Azure Speaker Verification) ASVs. Additionally, a real-world case study on Apple HomeKit proves the VMask’s practicability on smart home platforms.

  • Research Article
  • Cite Count Icon 21
  • 10.1109/taslp.2021.3133189
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
  • Jan 1, 2022
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • Haibin Wu + 5 more

Previous works have shown that automatic speaker verification (ASV) is seriously vulnerable to malicious spoofing attacks, such as replay, synthetic speech, and recently emerged adversarial attacks. Great efforts have been dedicated to defending ASV against replay and synthetic speech; however, only a few approaches have been explored to deal with adversarial attacks. All the existing approaches to tackle adversarial attacks for ASV require the knowledge for adversarial samples generation, but it is impractical for defenders to know the exact attack algorithms that are applied by the in-the-wild attackers. This work is among the first to perform adversarial defense for ASV without knowing the specific attack algorithms. Inspired by self-supervised learning models (SSLMs) that possess the merits of alleviating the superficial noise in the inputs and reconstructing clean samples from the interrupted ones, this work regards adversarial perturbations as one kind of noise and conducts adversarial defense for ASV by SSLMs. Specifically, we propose to perform adversarial defense from two perspectives: 1) adversarial perturbation purification and 2) adversarial perturbation detection. The purification module aims at alleviating the adversarial perturbations in the samples and pulling the contaminated adversarial inputs back towards the decision boundary. Experimental results show that our proposed purification module effectively counters adversarial attacks and outperforms traditional filters from both alleviating the adversarial noise and maintaining the performance of genuine samples. The detection module aims at detecting adversarial samples from genuine ones based on the statistical properties of ASV scores derived by a unique ASV integrating with different number of SSLMs. Experimental results show that our detection module helps shield the ASV by detecting adversarial samples. Both purification and detection methods are helpful for defending against different kinds of attack algorithms. Moreover, since there is no common metric for evaluating the ASV performance under adversarial attacks, this work also formalizes evaluation metrics for adversarial defense considering both purification and detection based approaches into account. We sincerely encourage future works to benchmark their approaches based on the proposed evaluation framework.

  • Conference Article
  • Cite Count Icon 24
  • 10.1109/icassp.1996.540304
Speaker recognition in reverberant enclosures
  • May 7, 1996
  • P.J Castellano + 2 more

This paper evaluates the effects of room reverberation on two automatic speaker verification (ASV) applications. Reverberation is simulated using an image method. AV is conducted with a multiple binary classifier model. Results are strongly dependent on speaker location, room size and reverberation time. ASV is poor given anechoic training but reverberant test speech. In voice activated security access, speaker locations can be identical for ASV training and testing, eliminating differences between respective impulse responses. Reverberation does not affect ASV even when it renders speech unintelligible. In covert speaker verification, speakers are uncooperative and mobile, thus test room responses unknown. By subjecting training speech to an impulse response corresponding to the centre of the room, ASV is degraded by 5.45 instead of 13.7 percent should anechoic training speech be used. Subjecting training speech to responses corresponding to two room locations does not further improve ASV performance.

  • Research Article
  • Cite Count Icon 2
  • 10.3906/elk-1805-113
Limited-data automatic speaker verification algorithm using band-limitedphase-only correlation function
  • Jul 26, 2019
  • TURKISH JOURNAL OF ELECTRICAL ENGINEERING & COMPUTER SCIENCES
  • Ángel David Pedroza Ramírez + 3 more

In this paper, a new method to deal with automatic speaker verification based on band-limited phaseonly correlation (BLPOC) is proposed. The aim of this study is to validate the use of the BLPOC function as a new limited-data automatic speaker verification technique. Although some speaker verification techniques have high accuracy, efficiency usually depends on the extraction of complex theoretical information from speech signals and the amount of the data for training the algorithms. The BLPOC function is a high-accuracy biometric technique traditionally implemented in human identification by fingerprints (through image-matching). When applying the BLPOC function in automatic speaker verification through the proposed algorithms (under limited-data conditions), a 98.24% true acceptance rate (TAR) and 87.17% true rejection rate (TRR) in a custom database (and 93.75% TAR and 67.05% TRR in the ELSDSR database) were obtained. The proposed algorithm is a theoretically simple method for automatic speaker verification whose main advantage is that it can provide identification under limited-data conditions. In this sense, the BLPOC function could be applicable in other limited-data biometric identifications by sound signals.

  • Conference Article
  • Cite Count Icon 3
  • 10.21437/odyssey.2022-13
Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion
  • Jun 28, 2022
  • Haibin Wu + 7 more

Recent years have witnessed the extraordinary development of automatic speaker verification (ASV). However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process. How to integrate the CM and ASV together remains an open question. A spoofing aware speaker verification (SASV) challenge has recently taken place with the argument that better performance can be delivered when both CM and ASV subsystems are optimized jointly. Under the challenge's scenario, the integrated systems proposed by the participants are required to reject both impostor speakers and spoofing attacks from target speakers, which intuitively and effectively matches the expectation of a reliable, spoofing-robust ASV system. This work focuses on fusion-based SASV solutions and proposes a multi-model fusion framework to leverage the power of multiple state-of-the-art ASV and CM models. The proposed framework vastly improves the SASV-EER from 8.75% to 1.17\%, which is 86% relative improvement compared to the best baseline system in the SASV challenge.

  • Research Article
  • Cite Count Icon 36
  • 10.1109/taslp.2017.2760243
Robust Voice Liveness Detection and Speaker Verification Using Throat Microphones
  • Jan 1, 2018
  • IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • Md Sahidullah + 6 more

While having a wide range of applications, automatic speaker verification (ASV) systems are vulnerable to spoofing attacks, in particular, replay attacks that are effective and easy to implement. Most prior work on detecting replay attacks uses audio from a single acoustic microphone only, leading to difficulties in detecting high-end replay attacks close to indistinguishable from live human speech. In this paper, we study the use of a special body-conducted sensor, throat microphone (TM), for combined voice liveness detection (VLD) and ASV in order to improve both robustness and security of ASV against replay attacks. We first investigate the possibility and methods of attacking a TM-based ASV system, followed by a pilot data collection. Second, we study the use of spectral features for VLD using both single-channel and dual-channel ASV systems. We carry out speaker verification experiments using Gaussian mixture model with universal background model (GMM-UBM) and i-vector based systems on a dataset of 38 speakers collected by us. We have achieved considerable improvement in recognition accuracy, with the use of dual-microphone setup. In experiments with noisy test speech, the false acceptance rate (FAR) of the dual-microphone GMM-UBM based system for recorded speech reduces from 69.69% to 18.75%. The FAR of replay condition further drops to 0% when this dual-channel ASV system is integrated with the new dual-channel voice liveness detector.

  • Conference Article
  • Cite Count Icon 1
  • 10.21437/interspeech.2022-602
Two Methods for Spoofing-Aware Speaker Verification: Multi-Layer Perceptron Score Fusion Model and Integrated Embedding Projector
  • Sep 18, 2022
  • Jungwoo Heo + 2 more

The use of deep neural networks (DNN) has dramatically elevated the performance of automatic speaker verification (ASV) over the last decade. However, ASV systems can be easily neutralized by spoofing attacks. Therefore, the Spoofing-Aware Speaker Verification (SASV) challenge is designed and held to promote development of systems that can perform ASV considering spoofing attacks by integrating ASV and spoofing countermeasure (CM) systems. In this paper, we propose two back-end systems: multi-layer perceptron score fusion model (MSFM) and integrated embedding projector (IEP). The MSFM, score fusion back-end system, derived SASV score utilizing ASV and CM scores and embeddings. On the other hand,IEP combines ASV and CM embeddings into SASV embedding and calculates final SASV score based on the cosine similarity. We effectively integrated ASV and CM systems through proposed MSFM and IEP and achieved the SASV equal error rates 0.56%, 1.32% on the official evaluation trials of the SASV 2022 challenge.

  • Research Article
  • 10.1371/journal.pone.0241809
Short-time speaker verification with different speaking style utterances
  • Nov 11, 2020
  • PLoS ONE
  • Hongwei Mao + 5 more

In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studies focused on extracting target speaker information from natural speech. This paper aims to design a new ASV corpus with multi-speaking styles and investigate the ASV robustness to these different speaking styles. We first release this corpus in the Zenodo website for public research, in which each speaker has several text-dependent and text-independent singing, humming and normal reading speech utterances. Then, we investigate the speaker discrimination of each speaking style in the feature space. Furthermore, the intra and inter-speaker variabilities in each different speaking style and cross-speaking styles are investigated in both text-dependent and text-independent ASV tasks. Conventional Gaussian Mixture Model (GMM), and the state-of-the-art x-vector are used to build ASV systems. Experimental results show that the voiceprint information in humming and singing speech are more distinguishable than that in normal reading speech for conventional ASV systems. Furthermore, we find that combing the three speaking styles can significantly improve the x-vector based ASV system, even when only limited gains are obtained by conventional GMM-based systems.

  • Research Article
  • Cite Count Icon 9
  • 10.1016/j.csl.2019.101024
Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores
  • Oct 3, 2019
  • Computer Speech & Language
  • Alexey Sholokhov + 3 more

Voice biometrics security: Extrapolating false alarm rate via hierarchical Bayesian modeling of speaker verification scores

  • Research Article
  • Cite Count Icon 23
  • 10.1016/j.csl.2020.101105
Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge
  • May 15, 2020
  • Computer Speech & Language
  • Mohammad Adiban + 2 more

Replay spoofing countermeasure using autoencoder and siamese networks on ASVspoof 2019 challenge

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.